summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-24block: reduce ifdef CONFIG_BLOCK madness in headersChristoph Hellwig3-63/+46
Large part of bio.h, blkdev.h and genhd.h are under ifdef CONFIG_BLOCK for no good reason. Only stub out function that are called from code that is not dependent on CONFIG_BLOCK and leave the harmless other declarations around. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: move the buffer_heads_over_limit stub to buffer_head.hChristoph Hellwig2-1/+1
Move the !CONFIG_BLOCK stub to the same place as the non-stub declaration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: move block-related definitions out of fs.hChristoph Hellwig12-94/+96
Move most of the block related definition out of fs.h into more suitable headers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: simplify sb_is_blkdev_sbChristoph Hellwig1-12/+6
Just use IS_ENABLED instead of providing a stub for !CONFIG_BLOCK. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: remove the mount_bdev and kill_block_super stubsChristoph Hellwig1-16/+0
No one calls these functions without CONFIG_BLOCK, so don't bother stubbing them out. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: remove the HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL definesChristoph Hellwig1-6/+0
These are not defined anywhere, and contrary to the comments we really do not care about out of tree code at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24fs: remove an unused block_device_operations forward declarationChristoph Hellwig1-2/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: mark bd_finish_claiming staticChristoph Hellwig2-5/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24tty/sysrq: emergency_thaw_all does not depend on CONFIG_BLOCKChristoph Hellwig2-3/+1
We can also thaw non-block file systems. Remove the CONFIG_BLOCK in sysrq.c after making the prototype available unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24nvme-rdma: fix a missing completion with remove invalidationChristoph Hellwig1-2/+3
Revert and incorret transformation that caused requests using remote invalidation to never complete. Fixes: 421147be863b ("nvme-rdma: factor out a nvme_rdma_end_request helper") Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-iocost: Use struct_size() in kzalloc_node()Gustavo A. R. Silva1-2/+1
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: bio: Use struct_size() in kmalloc()Gustavo A. R. Silva1-3/+1
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: create the request_queue debugfs_dir on registrationLuis Chamberlain6-48/+39
We were only creating the request_queue debugfs_dir only for make_request block drivers (multiqueue), but never for request-based block drivers. We did this as we were only creating non-blktrace additional debugfs files on that directory for make_request drivers. However, since blktrace *always* creates that directory anyway, we special-case the use of that directory on blktrace. Other than this being an eye-sore, this exposes request-based block drivers to the same debugfs fragile race that used to exist with make_request block drivers where if we start adding files onto that directory we can later run a race with a double removal of dentries on the directory if we don't deal with this carefully on blktrace. Instead, just simplify things by always creating the request_queue debugfs_dir on request_queue registration. Rename the mutex also to reflect the fact that this is used outside of the blktrace context. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blktrace: ensure our debugfs dir existsLuis Chamberlain1-0/+12
We make an assumption that a debugfs directory exists, but since this can fail ensure it exists before allowing blktrace setup to complete. Otherwise we end up stuffing blktrace files on the debugfs root directory. In the worst case scenario this *in theory* can create an eventual panic *iff* in the future a similarly named file is created prior on the debugfs root directory. This theoretical crash can happen due to a recursive removal followed by a specific dentry removal. This doesn't fix any known crash, however I have seen the files go into the main debugfs root directory in cases where the debugfs directory was not created due to other internal bugs with blktrace now fixed. blktrace is also completely useless without this directory, so this ensures to userspace we only setup blktrace if the kernel can stuff files where they are supposed to go into. debugfs directory creations typically aren't checked for, and we have maintainers doing sweep removals of these checks, but since we need this check to ensure proper userspace blktrace functionality we make sure to annotate the justification for the check. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blktrace: fix debugfs use after freeLuis Chamberlain1-6/+12
On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory") merged on v4.12 Omar fixed the original blktrace code for request-based drivers (multiqueue). This however left in place a possible crash, if you happen to abuse blktrace while racing to remove / add a device. We used to use asynchronous removal of the request_queue, and with that the issue was easier to reproduce. Now that we have reverted to synchronous removal of the request_queue, the issue is still possible to reproduce, its however just a bit more difficult. We essentially run two instances of break-blktrace which add/remove a loop device, and setup a blktrace and just never tear the blktrace down. We do this twice in parallel. This is easily reproduced with the script run_0004.sh from break-blktrace [0]. We can end up with two types of panics each reflecting where we race, one a failed blktrace setup: [ 252.426751] debugfs: Directory 'loop0' with parent 'block' already present! [ 252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 252.436592] #PF: supervisor write access in kernel mode [ 252.439822] #PF: error_code(0x0002) - not-present page [ 252.442967] PGD 0 P4D 0 [ 252.444656] Oops: 0002 [#1] SMP NOPTI [ 252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 252.456343] RIP: 0010:down_write+0x15/0x40 [ 252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246 [ 252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000 [ 252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001 [ 252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000 [ 252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0 [ 252.473346] FS: 00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000 [ 252.475225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0 [ 252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 252.479866] Call Trace: [ 252.480322] simple_recursive_removal+0x4e/0x2e0 [ 252.481078] ? debugfs_remove+0x60/0x60 [ 252.481725] ? relay_destroy_buf+0x77/0xb0 [ 252.482662] debugfs_remove+0x40/0x60 [ 252.483518] blk_remove_buf_file_callback+0x5/0x10 [ 252.484328] relay_close_buf+0x2e/0x60 [ 252.484930] relay_open+0x1ce/0x2c0 [ 252.485520] do_blk_trace_setup+0x14f/0x2b0 [ 252.486187] __blk_trace_setup+0x54/0xb0 [ 252.486803] blk_trace_ioctl+0x90/0x140 [ 252.487423] ? do_sys_openat2+0x1ab/0x2d0 [ 252.488053] blkdev_ioctl+0x4d/0x260 [ 252.488636] block_ioctl+0x39/0x40 [ 252.489139] ksys_ioctl+0x87/0xc0 [ 252.489675] __x64_sys_ioctl+0x16/0x20 [ 252.490380] do_syscall_64+0x52/0x180 [ 252.491032] entry_SYSCALL_64_after_hwframe+0x44/0xa9 And the other on the device removal: [ 128.528940] debugfs: Directory 'loop0' with parent 'block' already present! [ 128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 128.619537] #PF: supervisor write access in kernel mode [ 128.622700] #PF: error_code(0x0002) - not-present page [ 128.625842] PGD 0 P4D 0 [ 128.627585] Oops: 0002 [#1] SMP NOPTI [ 128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G E 5.7.0-rc2-next-20200420+ #164 [ 128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 128.640471] RIP: 0010:down_write+0x15/0x40 [ 128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246 [ 128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000 [ 128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0 [ 128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000 [ 128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0 [ 128.660500] FS: 00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000 [ 128.662204] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0 [ 128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 128.667282] Call Trace: [ 128.667801] simple_recursive_removal+0x4e/0x2e0 [ 128.668663] ? debugfs_remove+0x60/0x60 [ 128.669368] debugfs_remove+0x40/0x60 [ 128.669985] blk_trace_free+0xd/0x50 [ 128.670593] __blk_trace_remove+0x27/0x40 [ 128.671274] blk_trace_shutdown+0x30/0x40 [ 128.671935] blk_release_queue+0x95/0xf0 [ 128.672589] kobject_put+0xa5/0x1b0 [ 128.673188] disk_release+0xa2/0xc0 [ 128.673786] device_release+0x28/0x80 [ 128.674376] kobject_put+0xa5/0x1b0 [ 128.674915] loop_remove+0x39/0x50 [loop] [ 128.675511] loop_control_ioctl+0x113/0x130 [loop] [ 128.676199] ksys_ioctl+0x87/0xc0 [ 128.676708] __x64_sys_ioctl+0x16/0x20 [ 128.677274] do_syscall_64+0x52/0x180 [ 128.677823] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The common theme here is: debugfs: Directory 'loop0' with parent 'block' already present This crash happens because of how blktrace uses the debugfs directory where it places its files. Upon init we always create the same directory which would be needed by blktrace but we only do this for make_request drivers (multiqueue) block drivers. When you race a removal of these devices with a blktrace setup you end up in a situation where the make_request recursive debugfs removal will sweep away the blktrace files and then later blktrace will also try to remove individual dentries which are already NULL. The inverse is also possible and hence the two types of use after frees. We don't create the block debugfs directory on init for these types of block devices: * request-based block driver block devices * every possible partition * scsi-generic And so, this race should in theory only be possible with make_request drivers. We can fix the UAF by simply re-using the debugfs directory for make_request drivers (multiqueue) and only creating the ephemeral directory for the other type of block devices. The new clarifications on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace *prior* to processing a blktrace ensures the debugfs directories are only created if no possible directory name clashes are possible. This goes tested with: o nvme partitions o ISCSI with tgt, and blktracing against scsi-generic with: o block o tape o cdrom o media changer o blktests This patch is part of the work which disputes the severity of CVE-2019-19770 which shows this issue is not a core debugfs issue, but a misuse of debugfs within blktace. Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24loop: be paranoid on exit and prevent new additions / removalsLuis Chamberlain1-0/+4
Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blktrace: annotate required lock on do_blk_trace_setup()Luis Chamberlain1-0/+2
Ensure it is clear which lock is required on do_blk_trace_setup(). Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: revert back to synchronous request_queue removalLuis Chamberlain4-23/+47
Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on v4.12 moved the work behind blk_release_queue() into a workqueue after a splat floated around which indicated some work on blk_release_queue() could sleep in blk_exit_rl(). This splat would be possible when a driver called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue() as its final call) from an atomic context. blk_put_queue() decrements the refcount for the request_queue kobject, and upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is now removed through commit db6d99523560 ("block: remove request_list code") on v5.0, we reserve the right to be able to sleep within blk_release_queue() context. The last reference for the request_queue must not be called from atomic context. *When* the last reference to the request_queue reaches 0 varies, and so let's take the opportunity to document when that is expected to happen and also document the context of the related calls as best as possible so we can avoid future issues, and with the hopes that the synchronous request_queue removal sticks. We revert back to synchronous request_queue removal because asynchronous removal creates a regression with expected userspace interaction with several drivers. An example is when removing the loopback driver, one uses ioctls from userspace to do so, but upon return and if successful, one expects the device to be removed. Likewise if one races to add another device the new one may not be added as it is still being removed. This was expected behavior before and it now fails as the device is still present and busy still. Moving to asynchronous request_queue removal could have broken many scripts which relied on the removal to have been completed if there was no error. Document this expectation as well so that this doesn't regress userspace again. Using asynchronous request_queue removal however has helped us find other bugs. In the future we can test what could break with this arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE. While at it, update the docs with the context expectations for the request_queue / gendisk refcount decrement, and make these expectations explicit by using might_sleep(). Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression") Suggested-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: clarify context for refcount increment helpersLuis Chamberlain2-0/+8
Let us clarify the context under which the helpers to increment the refcount for the gendisk and request_queue can be called under. We make this explicit on the places where we may sleep with might_sleep(). We don't address the decrement context yet, as that needs some extra work and fixes, but will be addressed in the next patch. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: add docs for gendisk / request_queue refcount helpersLuis Chamberlain2-1/+62
This adds documentation for the gendisk / request_queue refcount helpers. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24nvme: use blk_mq_complete_request_remote to avoid an indirect function callChristoph Hellwig6-9/+18
Use the new blk_mq_complete_request_remote helper to avoid an indirect function call in the completion fast path. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24nvme-rdma: factor out a nvme_rdma_end_request helperChristoph Hellwig1-19/+17
Factor a small sniplet of duplicated code into a new helper in preparation for making this sniplet a little bit less trivial. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: add a new blk_mq_complete_request_remote APIChristoph Hellwig2-19/+27
This is a variant of blk_mq_complete_request_remote that only completes the request if it needs to be bounced to another CPU or a softirq. If the request can be completed locally the function returns false and lets the driver complete it without requring and indirect function call. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: factor out a blk_mq_complete_need_ipi helperChristoph Hellwig1-18/+21
Add a helper to decide if we can complete locally or need an IPI. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: remove the get_cpu/put_cpu pair in blk_mq_complete_requestChristoph Hellwig1-2/+1
We don't really care if we get migrated during the I/O completion. In the worth case we either perform an IPI that wasn't required, or complete the request on a CPU which we just migrated off. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: move failure injection out of blk_mq_complete_requestChristoph Hellwig19-72/+61
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: merge the softirq vs non-softirq IPI logicChristoph Hellwig1-65/+20
Both the softirq path for single queue devices and the multi-queue completion handler share the same logic to figure out if we need an IPI for the completion and eventually issue it. Merge the two versions into a single unified code path. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: short cut the IPI path in blk_mq_force_complete_rq for !SMPChristoph Hellwig1-1/+2
Let the compile optimize out the entire IPI path, given that we are obviously not going to use it. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: complete polled requests directlyChristoph Hellwig1-6/+11
Even for single queue devices there is no point in offloading a polled completion to the softirq, given that blk_mq_force_complete_rq is called from the polling thread in that case and thus there are no starvation issues. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: remove raise_blk_irqChristoph Hellwig1-30/+10
By open coding raise_blk_irq in the only caller, and replacing the ifdef CONFIG_SMP with an IS_ENABLED check the flow in the caller can be significantly simplified. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: factor out a helper to reise the block softirqChristoph Hellwig1-17/+14
Add a helper to deduplicate the logic that raises the block softirq. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24blk-mq: merge blk-softirq.c into blk-mq.cChristoph Hellwig4-158/+136
__blk_complete_request is only called from the blk-mq code, and duplicates a lot of code from blk-mq.c. Move it there to prepare for better code sharing and simplifications. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-22Linux 5.8-rc2v5.8-rc2Linus Torvalds1-1/+1
2020-06-22Merge tag 'selinux-pr-20200621' of ↵Linus Torvalds2-13/+12
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux fixes from Paul Moore: "Three small patches to fix problems in the SELinux code, all found via clang. Two patches fix potential double-free conditions and one fixes an undefined return value" * tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix undefined return of cond_evaluate_expr selinux: fix a double free in cond_read_node()/cond_read_list() selinux: fix double free
2020-06-21Merge tag 'pinctrl-v5.8-2' of ↵Linus Torvalds6-23/+19
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some early fixes collected during the first week after the merge window, all pretty self-evident, with the details below. The revert is the crucial thing. - Fix a warning on the Qualcomm SPMI GPIO chip being instatiated twice without a unique irqchip struct - Use the noirq variants of the suspend and resume callbacks in the Tegra driver - Clean up the errorpath on the MCP23s08 driver - Revert the use of devm_of_iomap() in the Freescale driver as it was regressing the platform - Add some missing pins in the Qualcomm IPQ6018 driver - Fix a simple documentation bug in the pinctrl-single driver" * tag 'pinctrl-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: single: fix function name in documentation pinctrl: qcom: ipq6018 Add missing pins in qpic pin group Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'" pinctrl: mcp23s08: Split to three parts: fix ptr_ret.cocci warnings pinctrl: tegra: Use noirq suspend/resume callbacks pinctrl: qcom: spmi-gpio: fix warning about irq chip reusage
2020-06-21Merge tag 'kbuild-fixes-v5.8' of ↵Linus Torvalds8-34/+18
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix -gz=zlib compiler option test for CONFIG_DEBUG_INFO_COMPRESSED - improve cc-option in scripts/Kbuild.include to clean up temp files - improve cc-option in scripts/Kconfig.include for more reliable compile option test - do not copy modules.builtin by 'make install' because it would break existing systems - use 'userprogs' syntax for watch_queue sample * tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: samples: watch_queue: build sample program for target architecture Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n" scripts: Fix typo in headers_install.sh kconfig: unify cc-option and as-option kbuild: improve cc-option to clean up all temporary files Makefile: Improve compressed debug info support detection
2020-06-21Merge tag 'powerpc-5.8-3' of ↵Linus Torvalds7-20/+37
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - One fix for the interrupt rework we did last release which broke KVM-PR - Three commits fixing some fallout from the READ_ONCE() changes interacting badly with our 8xx 16K pages support, which uses a pte_t that is a structure of 4 actual PTEs - A cleanup of the 8xx pte_update() to use the newly added pmd_off() - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is enabled - A minor fix for the SPU syscall generation Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike Rapoport, Nicholas Piggin. * tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/8xx: Provide ptep_get() with 16k pages mm: Allow arches to provide ptep_get() mm/gup: Use huge_ptep_get() in gup_hugepte() powerpc/syscalls: Use the number when building SPU syscall table powerpc/8xx: use pmd_off() to access a PMD entry in pte_update() powerpc/64s: Fix KVM interrupt using wrong save area powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
2020-06-21Merge branch 'linus' of ↵Linus Torvalds12-35/+45
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - NULL dereference in octeontx - PM reference imbalance in ks-sa - deadlock in crypto manager - memory leak in drbg - missing socket limit check on receive SG list size in algif_skcipher - typos in caam - warnings in ccp and hisilicon * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: drbg - always try to free Jitter RNG instance crypto: marvell/octeontx - Fix a potential NULL dereference crypto: algboss - don't wait during notifier callback crypto: caam - fix typos crypto: ccp - Fix sparse warnings in sev-dev crypto: hisilicon - Cap block size at 2^31 crypto: algif_skcipher - Cap recv SG list at ctx->used hwrng: ks-sa - Fix runtime PM imbalance on error
2020-06-21samples: watch_queue: build sample program for target architectureMasahiro Yamada2-7/+5
This userspace program includes UAPI headers exported to usr/include/. 'make headers' always works for the target architecture (i.e. the same architecture as the kernel), so the sample program should be built for the target as well. Kbuild now supports 'userprogs' for that. I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-21Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n"Masahiro Yamada1-11/+3
This reverts commit e0b250b57dcf403529081e5898a9de717f96b76b, which broke build systems that need to install files to a certain path, but do not set INSTALL_MOD_PATH when invoking 'make install'. $ make INSTALL_PATH=/tmp/destdir install mkdir: cannot create directory ‘/lib/modules/5.8.0-rc1+/’: Permission denied Makefile:1342: recipe for target '_builtin_inst_' failed make: *** [_builtin_inst_] Error 1 While modules.builtin is useful also for CONFIG_MODULES=n, this change in the behavior is quite unexpected. Maybe "make modules_install" can install modules.builtin irrespective of CONFIG_MODULES as Jonas originally suggested. Anyway, that commit should be reverted ASAP. Reported-by: Douglas Anderson <dianders@chromium.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net>
2020-06-21Merge tag 'scsi-fixes' of ↵Linus Torvalds10-1/+15
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One minor fix and two patches reworking the ata dma drain for the !CONFIG_LIBATA case. The latter is a 5.7 regression fix" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Wire up ata_scsi_dma_need_drain for SAS HBA drivers scsi: libata: Provide an ata_scsi_dma_need_drain stub for !CONFIG_ATA scsi: ufs-bsg: Fix runtime PM imbalance on error
2020-06-21Merge branch 'i2c/for-current' of ↵Linus Torvalds10-48/+25
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - a small collection of remaining API conversion patches (all acked) which allow to finally remove the deprecated API - some documentation fixes and a MAINTAINERS addition * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Add robert and myself as qcom i2c cci maintainers i2c: smbus: Fix spelling mistake in the comments Documentation/i2c: SMBus start signal is S not A i2c: remove deprecated i2c_new_device API Documentation: media: convert to use i2c_new_client_device() video: backlight: tosa_lcd: convert to use i2c_new_client_device() x86/platform/intel-mid: convert to use i2c_new_client_device() drm: encoder_slave: use new I2C API drm: encoder_slave: fix refcouting error for modules
2020-06-20pinctrl: single: fix function name in documentationDrew Fustini1-1/+1
Use the correct the function name in the documentation for "pcs_parse_one_pinctrl_entry()". "smux_parse_one_pinctrl_entry()" appears to be an artifact from the development of a prior patch series ("simple pinmux driver") which transformed into pinctrl-single. Signed-off-by: Drew Fustini <drew@beagleboard.org> Link: https://lore.kernel.org/r/20200612112758.GA3407886@x1 Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20Merge tag 'trace-v5.8-rc1' of ↵Linus Torvalds15-67/+239
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Have recordmcount work with > 64K sections (to support LTO) - kprobe RCU fixes - Correct a kprobe critical section with missing mutex - Remove redundant arch_disarm_kprobe() call - Fix lockup when kretprobe triggers within kprobe_flush_task() - Fix memory leak in fetch_op_data operations - Fix sleep in atomic in ftrace trace array sample code - Free up memory on failure in sample trace array code - Fix incorrect reporting of function_graph fields in format file - Fix quote within quote parsing in bootconfig - Fix return value of bootconfig tool - Add testcases for bootconfig tool - Fix maybe uninitialized warning in ftrace pid file code - Remove unused variable in tracing_iter_reset() - Fix some typos * tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix maybe-uninitialized compiler warning tools/bootconfig: Add testcase for show-command and quotes test tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig tools/bootconfig: Fix to use correct quotes for value proc/bootconfig: Fix to use correct quotes for value tracing: Remove unused event variable in tracing_iter_reset tracing/probe: Fix memleak in fetch_op_data operations trace: Fix typo in allocate_ftrace_ops()'s comment tracing: Make ftrace packed events have align of 1 sample-trace-array: Remove trace_array 'sample-instance' sample-trace-array: Fix sleeping function called from invalid context kretprobe: Prevent triggering kretprobe from within kprobe_flush_task kprobes: Remove redundant arch_disarm_kprobe() call kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex kprobes: Use non RCU traversal APIs on kprobe_tables if possible kprobes: Suppress the suspicious RCU warning on kprobes recordmcount: support >64k sections
2020-06-20Merge tag 'libnvdimm-for-5.8-rc2' of ↵Linus Torvalds7-23/+618
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "A feature (papr_scm health retrieval) and a fix (sysfs attribute visibility) for v5.8. Vaibhav explains in the merge commit below why missing v5.8 would be painful and I agreed to try a -rc2 pull because only cosmetics kept this out of -rc1 and his initial versions were posted in more than enough time for v5.8 consideration: 'These patches are tied to specific features that were committed to customers in upcoming distros releases (RHEL and SLES) whose time-lines are tied to 5.8 kernel release. Being able to track the health of an nvdimm is critical for our customers that are running workloads leveraging papr-scm nvdimms. Missing the 5.8 kernel would mean missing the distro timelines and shifting forward the availability of this feature in distro kernels by at least 6 months' Summary: - Fix the visibility of the region 'align' attribute. The new unit tests for region alignment handling caught a corner case where the alignment cannot be specified if the region is converted from static to dynamic provisioning at runtime. - Add support for device health retrieval for the persistent memory supported by the papr_scm driver. This includes both the standard sysfs "health flags" that the nfit persistent memory driver publishes and a mechanism for the ndctl tool to retrieve a health-command payload" * tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/region: always show the 'align' attribute powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl() powerpc/papr_scm: Fetch nvdimm health information from PHYP seq_buf: Export seq_buf_printf powerpc: Document details on H_SCM_HEALTH hcall
2020-06-20pinctrl: qcom: ipq6018 Add missing pins in qpic pin groupSivaprakash Murugesan1-1/+2
The patch adds missing qpic data pins to qpic pingroup. These pins are necessary for the qpic nand to work. Fixes: ef1ea54eab0e ("pinctrl: qcom: Add ipq6018 pinctrl driver") Signed-off-by: Sivaprakash Murugesan <sivaprak@codeaurora.org> Link: https://lore.kernel.org/r/1592541089-17700-1-git-send-email-sivaprak@codeaurora.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource ↵Haibo Chen1-4/+3
leak in case of error in 'imx_pinctrl_probe()'" This reverts commit ba403242615c2c99e27af7984b1650771a2cc2c9. After commit 26d8cde5260b ("pinctrl: freescale: imx: add shared input select reg support"). i.MX7D has two iomux controllers iomuxc and iomuxc-lpsr which share select_input register for daisy chain settings. If use 'devm_of_iomap()', when probe the iomuxc-lpsr, will call devm_request_mem_region() for the region <0x30330000-0x3033ffff> for the first time. Then, next time when probe the iomuxc, API devm_platform_ioremap_resource() will also use the API devm_request_mem_region() for the share region <0x30330000-0x3033ffff> again, then cause issue, log like below: [ 0.179561] imx7d-pinctrl 302c0000.iomuxc-lpsr: initialized IMX pinctrl driver [ 0.191742] imx7d-pinctrl 30330000.pinctrl: can't request region for resource [mem 0x30330000-0x3033ffff] [ 0.191842] imx7d-pinctrl: probe of 30330000.pinctrl failed with error -16 Fixes: ba403242615c ("pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Link: https://lore.kernel.org/r/1591673223-1680-1-git-send-email-haibo.chen@nxp.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-06-20Merge tag 's390-5.8-2' of ↵Linus Torvalds18-134/+154
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - a few ptrace fixes mostly for strace and seccomp_bpf kernel tests findings - cleanup unused pm callbacks in virtio ccw - replace kmalloc + memset with kzalloc in crypto - use $(LD) for vDSO linkage to make clang happy - fix vDSO clock_getres() to preserve the same behaviour as posix_get_hrtimer_res() - fix workqueue cpumask warning when NUMA=n and nr_node_ids=2 - reduce SLSB writes during input processing, improve warnings and cleanup qdio_data usage in qdio - a few fixes to use scnprintf() instead of snprintf() * tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: fix syscall_get_error for compat processes s390/qdio: warn about unexpected SLSB states s390/qdio: clean up usage of qdio_data s390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES s390/vdso: fix vDSO clock_getres() s390/vdso: Use $(LD) instead of $(CC) to link vDSO s390/protvirt: use scnprintf() instead of snprintf() s390: use scnprintf() in sys_##_prefix##_##_name##_show s390/crypto: use scnprintf() instead of snprintf() s390/zcrypt: use kzalloc s390/virtio: remove unused pm callbacks s390/qdio: reduce SLSB writes during Input Queue processing selftests/seccomp: s390 shares the syscall and return value register s390/ptrace: fix setting syscall number s390/ptrace: pass invalid syscall numbers to tracing s390/ptrace: return -ENOSYS when invalid syscall is supplied s390/seccomp: pass syscall arguments via seccomp_data s390/qdio: fine-tune SLSB update
2020-06-20Merge tag 'riscv-for-linus-5.8-rc2' of ↵Linus Torvalds3-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - a workaround for a compiler surprise related to the "r" inline assembly that allows LLVM to boot. - a fix to avoid WX-only mappings, which the ISA does not allow. While this probably manifests in many ways, the bug was found in stress-ng. - a missing lock in set_direct_map_*(), which due to a recent lockdep change started asserting. * tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Acquire mmap lock before invoking walk_page_range RISC-V: Don't allow write+exec only page mapping request in mmap riscv/atomic: Fix sign extension for RV64I
2020-06-20Merge tag 'linux-kselftest-5.8-rc2' of ↵Linus Torvalds80-637/+123
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest cleanups from Shuah Khan: - ftrace "requires:" list for simplifying and unifying requirement checks for each test case, adding "requires:" line instead of checking required ftrace interfaces in each test case. - a minor spelling correction patch * tag 'linux-kselftest-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: Support ":README" suffix for requires selftests/ftrace: Support ":tracer" suffix for requires selftests/ftrace: Convert check_filter_file() with requires list selftests/ftrace: Convert required interface checks into requires list selftests/ftrace: Add "requires:" list support selftests/ftrace: Return unsupported for the unconfigured features selftests/ftrace: Allow ":" in description tools: testing: ftrace: trigger: fix spelling mistake