summaryrefslogtreecommitdiff
path: root/fs/dlm
AgeCommit message (Collapse)AuthorFilesLines
2022-04-06dlm: use __le types for rcom messagesAlexander Aring7-80/+52
This patch changes to use __le types directly in the dlm rcom structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: use __le types for dlm headerAlexander Aring9-116/+98
This patch changes to use __le types directly in the dlm header structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: use __le types for options headerAlexander Aring1-5/+5
This patch changes to use __le types directly in the dlm option headers structures which are casted at the right dlm message buffer positions. Currently only midcomms.c using those headers which already was calling endian conversions on-the-fly without using in/out functionality like other endianness handling in dlm. Using __le types now will hopefully get useful warnings in future if we do comparison against host byte order values. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: add __CHECKER__ for false positivesAlexander Aring2-0/+20
This patch will adds #ifndef __CHECKER__ for false positives warnings about an imbalance lock/unlock srcu handling. Which are shown by running sparse checks: fs/dlm/midcomms.c:1065:20: warning: context imbalance in 'dlm_midcomms_get_mhandle' - wrong count at exit Using __CHECKER__ will tell sparse to ignore these sections. Those imbalances are false positive because from upper layer it is always required to call a function in sequence, e.g. if dlm_midcomms_get_mhandle() is successful there must be a dlm_midcomms_commit_mhandle() call afterwards. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: move global to static initsAlexander Aring1-11/+5
Instead of init global module at module loading time we can move the initialization of those global variables at memory initialization of the module loader. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: remove unnecessary INIT_LIST_HEAD()Alexander Aring1-1/+0
There is no need to call INIT_LIST_HEAD() when it's set directly afterwards by list_add_tail(). Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: improve plock logging if interruptedAlexander Aring1-4/+5
This patch changes the log level if a plock is removed when interrupted from debug to info. Additional it signals now that the plock entity was removed to let the user know what's happening. If on a dev_write() a pending plock cannot be find it will signal that it might have been removed because wait interruption. Before this patch there might be a "dev_write no op ..." info message and the users can only guess that the plock was removed before because the wait interruption. To be sure that is the case we log both messages on the same log level. Let both message be logged on info layer because it should not happened a lot and if it happens it should be clear why the op was not found. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: rearrange async condition returnAlexander Aring1-14/+13
This patch moves the return of FILE_LOCK_DEFERRED a little bit earlier than checking afterwards again if the request was an asynchronous request. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: cleanup plock_op vs plock_xopAlexander Aring1-31/+46
Lately the different casting between plock_op and plock_xop and list holders which was involved showed some issues which were hard to see. This patch removes the "plock_xop" structure and introduces a "struct plock_async_data". This structure will be set in "struct plock_op" in case of asynchronous lock handling as the original "plock_xop" was made for. There is no need anymore to cast pointers around for additional fields in case of asynchronous lock handling. As disadvantage another allocation was introduces but only needed in the asynchronous case which is currently only used in combination with nfs lockd. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: replace sanity checks with WARN_ONAlexander Aring1-28/+4
There are several sanity checks and recover handling if they occur in the dlm plock handling. From my understanding those operation can't run in parallel with any list manipulation which involved setting the list holder of plock_op, if so we have a bug which this sanity check will warn about. Previously if such sanity check occurred the dlm plock handling was trying to recover from it by deleting the plock_op from a list which the holder was set to. However there is a bug in the dlm plock handling if this case ever happens. To make such bugs are more visible for further investigations we add a WARN_ON() on those sanity checks and remove the recovering handling because other possible side effects. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: fix plock invalid readAlexander Aring1-7/+5
This patch fixes an invalid read showed by KASAN. A unlock will allocate a "struct plock_op" and a followed send_op() will append it to a global send_list data structure. In some cases a followed dev_read() moves it to recv_list and dev_write() will cast it to "struct plock_xop" and access fields which are only available in those structures. At this point an invalid read happens by accessing those fields. To fix this issue the "callback" field is moved to "struct plock_op" to indicate that a cast to "plock_xop" is allowed and does the additional "plock_xop" handling if set. Example of the KASAN output which showed the invalid read: [ 2064.296453] ================================================================== [ 2064.304852] BUG: KASAN: slab-out-of-bounds in dev_write+0x52b/0x5a0 [dlm] [ 2064.306491] Read of size 8 at addr ffff88800ef227d8 by task dlm_controld/7484 [ 2064.308168] [ 2064.308575] CPU: 0 PID: 7484 Comm: dlm_controld Kdump: loaded Not tainted 5.14.0+ #9 [ 2064.310292] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 2064.311618] Call Trace: [ 2064.312218] dump_stack_lvl+0x56/0x7b [ 2064.313150] print_address_description.constprop.8+0x21/0x150 [ 2064.314578] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.315610] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.316595] kasan_report.cold.14+0x7f/0x11b [ 2064.317674] ? dev_write+0x52b/0x5a0 [dlm] [ 2064.318687] dev_write+0x52b/0x5a0 [dlm] [ 2064.319629] ? dev_read+0x4a0/0x4a0 [dlm] [ 2064.320713] ? bpf_lsm_kernfs_init_security+0x10/0x10 [ 2064.321926] vfs_write+0x17e/0x930 [ 2064.322769] ? __fget_light+0x1aa/0x220 [ 2064.323753] ksys_write+0xf1/0x1c0 [ 2064.324548] ? __ia32_sys_read+0xb0/0xb0 [ 2064.325464] do_syscall_64+0x3a/0x80 [ 2064.326387] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2064.327606] RIP: 0033:0x7f807e4ba96f [ 2064.328470] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 39 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 7c 87 f8 ff 48 [ 2064.332902] RSP: 002b:00007ffd50cfe6e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 2064.334658] RAX: ffffffffffffffda RBX: 000055cc3886eb30 RCX: 00007f807e4ba96f [ 2064.336275] RDX: 0000000000000040 RSI: 00007ffd50cfe7e0 RDI: 0000000000000010 [ 2064.337980] RBP: 00007ffd50cfe7e0 R08: 0000000000000000 R09: 0000000000000001 [ 2064.339560] R10: 000055cc3886eb30 R11: 0000000000000293 R12: 000055cc3886eb80 [ 2064.341237] R13: 000055cc3886eb00 R14: 000055cc3886f590 R15: 0000000000000001 [ 2064.342857] [ 2064.343226] Allocated by task 12438: [ 2064.344057] kasan_save_stack+0x1c/0x40 [ 2064.345079] __kasan_kmalloc+0x84/0xa0 [ 2064.345933] kmem_cache_alloc_trace+0x13b/0x220 [ 2064.346953] dlm_posix_unlock+0xec/0x720 [dlm] [ 2064.348811] do_lock_file_wait.part.32+0xca/0x1d0 [ 2064.351070] fcntl_setlk+0x281/0xbc0 [ 2064.352879] do_fcntl+0x5e4/0xfe0 [ 2064.354657] __x64_sys_fcntl+0x11f/0x170 [ 2064.356550] do_syscall_64+0x3a/0x80 [ 2064.358259] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2064.360745] [ 2064.361511] Last potentially related work creation: [ 2064.363957] kasan_save_stack+0x1c/0x40 [ 2064.365811] __kasan_record_aux_stack+0xaf/0xc0 [ 2064.368100] call_rcu+0x11b/0xf70 [ 2064.369785] dlm_process_incoming_buffer+0x47d/0xfd0 [dlm] [ 2064.372404] receive_from_sock+0x290/0x770 [dlm] [ 2064.374607] process_recv_sockets+0x32/0x40 [dlm] [ 2064.377290] process_one_work+0x9a8/0x16e0 [ 2064.379357] worker_thread+0x87/0xbf0 [ 2064.381188] kthread+0x3ac/0x490 [ 2064.383460] ret_from_fork+0x22/0x30 [ 2064.385588] [ 2064.386518] Second to last potentially related work creation: [ 2064.389219] kasan_save_stack+0x1c/0x40 [ 2064.391043] __kasan_record_aux_stack+0xaf/0xc0 [ 2064.393303] call_rcu+0x11b/0xf70 [ 2064.394885] dlm_process_incoming_buffer+0x47d/0xfd0 [dlm] [ 2064.397694] receive_from_sock+0x290/0x770 [dlm] [ 2064.399932] process_recv_sockets+0x32/0x40 [dlm] [ 2064.402180] process_one_work+0x9a8/0x16e0 [ 2064.404388] worker_thread+0x87/0xbf0 [ 2064.406124] kthread+0x3ac/0x490 [ 2064.408021] ret_from_fork+0x22/0x30 [ 2064.409834] [ 2064.410599] The buggy address belongs to the object at ffff88800ef22780 [ 2064.410599] which belongs to the cache kmalloc-96 of size 96 [ 2064.416495] The buggy address is located 88 bytes inside of [ 2064.416495] 96-byte region [ffff88800ef22780, ffff88800ef227e0) [ 2064.422045] The buggy address belongs to the page: [ 2064.424635] page:00000000b6bef8bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xef22 [ 2064.428970] flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff) [ 2064.432515] raw: 000fffffc0000200 ffffea0000d68b80 0000001400000014 ffff888001041780 [ 2064.436110] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 2064.439813] page dumped because: kasan: bad access detected [ 2064.442548] [ 2064.443310] Memory state around the buggy address: [ 2064.445988] ffff88800ef22680: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.449444] ffff88800ef22700: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.452941] >ffff88800ef22780: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc [ 2064.456383] ^ [ 2064.459386] ffff88800ef22800: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc [ 2064.462788] ffff88800ef22880: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc [ 2064.466239] ================================================================== reproducer in python: import argparse import struct import fcntl import os parser = argparse.ArgumentParser() parser.add_argument('-f', '--file', help='file to use fcntl, must be on dlm lock filesystem e.g. gfs2') args = parser.parse_args() f = open(args.file, 'wb+') lockdata = struct.pack('hhllhh', fcntl.F_WRLCK,0,0,0,0,0) fcntl.fcntl(f, fcntl.F_SETLK, lockdata) lockdata = struct.pack('hhllhh', fcntl.F_UNLCK,0,0,0,0,0) fcntl.fcntl(f, fcntl.F_SETLK, lockdata) Fixes: 586759f03e2e ("gfs2: nfs lock support for gfs2") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: fix missing check in validate_lock_argsAlexander Aring1-1/+2
This patch adds a additional check if lkb->lkb_wait_count is non zero as it is done in validate_unlock_args() to check if any operation is in progress. While on it add a comment taken from validate_unlock_args() to signal what the check is doing. There might be no changes because if lkb->lkb_wait_type is non zero implies that lkb->lkb_wait_count is non zero. However we should add the check as it does validate_unlock_args(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06dlm: uninitialized variable on error in dlm_listen_for_all()Dan Carpenter1-1/+1
The "sock" variable is not initialized on this error path. Cc: stable@vger.kernel.org Fixes: 2dc6b1158c28 ("fs: dlm: introduce generic listen") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-01-12Merge tag 'driver-core-5.17-rc1' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of changes for the driver core for 5.17-rc1. Lots of little things here, including: - kobj_type cleanups - auxiliary_bus documentation updates - auxiliary_device conversions for some drivers (relevant subsystems all have provided acks for these) - kernfs lock contention reduction for some workloads - other tiny cleanups and changes. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits) kobject documentation: remove default_attrs information drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb debugfs: lockdown: Allow reading debugfs files that are not world readable driver core: Make bus notifiers in right order in really_probe() driver core: Move driver_sysfs_remove() after driver_sysfs_add() firmware: edd: remove empty default_attrs array firmware: dmi-sysfs: use default_groups in kobj_type qemu_fw_cfg: use default_groups in kobj_type firmware: memmap: use default_groups in kobj_type sh: sq: use default_groups in kobj_type headers/uninline: Uninline single-use function: kobject_has_children() devtmpfs: mount with noexec and nosuid driver core: Simplify async probe test code by using ktime_ms_delta() nilfs2: use default_groups in kobj_type kobject: remove kset from struct kset_uevent_ops callbacks driver core: make kobj_type constant. driver core: platform: document registration-failure requirement vdpa/mlx5: Use auxiliary_device driver data helpers net/mlx5e: Use auxiliary_device driver data helpers soundwire: intel: Use auxiliary_device driver data helpers ...
2022-01-04fs: dlm: print cluster addr if non-cluster node connectsAlexander Aring1-4/+22
This patch prints the cluster node address if a non-cluster node (according to the dlm config setting) tries to connect. The current hexdump call will print in a different loglevel and only available if dynamic debug is enabled. Additional we using the ip address format strings to print an IETF ip4/6 string represenation. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-28kobject: remove kset from struct kset_uevent_ops callbacksGreg Kroah-Hartman1-2/+1
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-07fs: dlm: memory cache for lowcomms hotpathAlexander Aring4-3/+31
This patch introduces a kmem cache for dlm_msg handles which are used always if dlm sends a message out. Even if their are covered by midcomms layer or not. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: memory cache for writequeue_entryAlexander Aring4-6/+44
This patch introduces a kmem cache for writequeue entry. A writequeue entry get quite a lot allocated if dlm transmit messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: memory cache for midcomms hotpathAlexander Aring4-11/+44
This patch will introduce a kmem cache for allocating message handles which are needed for midcomms layer to take track of lowcomms messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: remove wq_alloc mutexAlexander Aring1-37/+11
This patch cleanups the code for allocating a new buffer in the dlm writequeue mechanism. There was a possible tuneup to allow scheduling while a new writequeue entry needs to be allocated because either no sending page is available or are full. To avoid multiple concurrent users checking at the same time if an entry is available or full alloc_wq was introduce that those are waiting if there is currently a new writequeue entry in process to be queued so possible further users will check on the new allocated writequeue entry if it's full. To simplify the code we just remove this mutex and switch that the already introduced spin lock will be held during writequeue check, allocation and queueing. So other users can never check on available writequeues while there is a new one in process but not queued yet. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: use event based wait for pending removeAlexander Aring3-7/+14
This patch will use an event based waitqueue to wait for a possible clash with the ls_remove_name field of dlm_ls instead of doing busy waiting. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: check for pending users filling buffersAlexander Aring1-1/+4
Currently we don't care if the DLM application stack is filling buffers (not committed yet) while we transmit some already committed buffers. By checking on active writequeue users before dequeue a writequeue entry we know there is coming more data and do nothing. We wait until the send worker will be triggered again if the writequeue entry users hit zero. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07fs: dlm: use list_empty() to check last iterationAlexander Aring1-1/+3
This patch will use list_empty(&ls->ls_cb_delay) to check for last list iteration. In case of a multiply count of MAX_CB_QUEUE and the list is empty we do a extra goto more which we can avoid by checking on list_empty(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-17fs: dlm: fix build with CONFIG_IPV6 disabledAlexander Aring1-0/+2
This patch will surround the AF_INET6 case in sk_error_report() of dlm with a #if IS_ENABLED(CONFIG_IPV6). The field sk->sk_v6_daddr is not defined when CONFIG_IPV6 is disabled. If CONFIG_IPV6 is disabled, the socket creation with AF_INET6 should already fail because a runtime check if AF_INET6 is registered. However if there is the possibility that AF_INET6 is set as sk_family the sk_error_report() callback will print then an invalid family type error. Reported-by: kernel test robot <lkp@intel.com> Fixes: 4c3d90570bcc ("fs: dlm: don't call kernel_getpeername() in error_report()") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-15fs: dlm: replace use of socket sk_callback_lock with sock_lockAlexander Aring1-17/+10
This patch will replace the use of socket sk_callback_lock lock and uses socket lock instead. Some users like sunrpc, see commit ea9afca88bbe ("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving from sk_callback_lock to sock_lock which seems to be held when the socket callbacks are called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-15fs: dlm: don't call kernel_getpeername() in error_report()Alexander Aring1-22/+20
In some cases kernel_getpeername() will held the socket lock which is already held when the socket layer calls error_report() callback. Since commit 9dfc685e0262 ("inet: remove races in inet{6}_getname()") this problem becomes more likely because the socket lock will be held always. You will see something like: bob9-u5 login: [ 562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0 [ 562.318562] lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7 [ 562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135 [ 562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 [ 562.321277] Call Trace: [ 562.321529] <IRQ> [ 562.321734] dump_stack_lvl+0x33/0x42 [ 562.322282] do_raw_spin_lock+0x8b/0xc0 [ 562.322674] lock_sock_nested+0x1e/0x50 [ 562.323057] inet_getname+0x39/0x110 [ 562.323425] ? sock_def_readable+0x80/0x80 [ 562.323838] lowcomms_error_report+0x63/0x260 [dlm] [ 562.324338] ? wait_for_completion_interruptible_timeout+0xd2/0x120 [ 562.324949] ? lock_timer_base+0x67/0x80 [ 562.325330] ? do_raw_spin_unlock+0x49/0xc0 [ 562.325735] ? _raw_spin_unlock_irqrestore+0x1e/0x40 [ 562.326218] ? del_timer+0x54/0x80 [ 562.326549] sk_error_report+0x12/0x70 [ 562.326919] tcp_validate_incoming+0x3c8/0x530 [ 562.327347] ? kvm_clock_read+0x14/0x30 [ 562.327718] ? ktime_get+0x3b/0xa0 [ 562.328055] tcp_rcv_established+0x121/0x660 [ 562.328466] tcp_v4_do_rcv+0x132/0x260 [ 562.328835] tcp_v4_rcv+0xcea/0xe20 [ 562.329173] ip_protocol_deliver_rcu+0x35/0x1f0 [ 562.329615] ip_local_deliver_finish+0x54/0x60 [ 562.330050] ip_local_deliver+0xf7/0x110 [ 562.330431] ? inet_rtm_getroute+0x211/0x840 [ 562.330848] ? ip_protocol_deliver_rcu+0x1f0/0x1f0 [ 562.331310] ip_rcv+0xe1/0xf0 [ 562.331603] ? ip_local_deliver+0x110/0x110 [ 562.332011] __netif_receive_skb_core+0x46a/0x1040 [ 562.332476] ? inet_gro_receive+0x263/0x2e0 [ 562.332885] __netif_receive_skb_list_core+0x13b/0x2c0 [ 562.333383] netif_receive_skb_list_internal+0x1c8/0x2f0 [ 562.333896] ? update_load_avg+0x7e/0x5e0 [ 562.334285] gro_normal_list.part.149+0x19/0x40 [ 562.334722] napi_complete_done+0x67/0x160 [ 562.335134] virtnet_poll+0x2ad/0x408 [virtio_net] [ 562.335644] __napi_poll+0x28/0x140 [ 562.336012] net_rx_action+0x23d/0x300 [ 562.336414] __do_softirq+0xf2/0x2ea [ 562.336803] irq_exit_rcu+0xc1/0xf0 [ 562.337173] common_interrupt+0xb9/0xd0 It is and was always forbidden to call kernel_getpeername() in context of error_report(). To get rid of the problem we access the destination address for the peer over the socket structure. While on it we fix to print out the destination port of the inet socket. Fixes: 1a31833d085a ("DLM: Replace nodeid_to_addr with kernel_getpeername") Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-12fs: dlm: fix potential buffer overflowAlexander Aring1-1/+1
This patch fixes an potential overflow in sscanf and the maximum declared string parsing length which seems to be excluding the null termination symbol. This patch will just add one byte to be prepared on a string with length of DLM_RESNAME_MAXLEN including the null termination symbol. Fixes: 5054e79de999 ("fs: dlm: add lkb debugfs functionality") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-05fs: dlm:Remove unneeded semicolonZhang Mingyu1-1/+1
Eliminate the following coccinelle check warning: fs/dlm/midcomms.c:972:2-3 Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-04fs: dlm: remove double list_first_entry callAlexander Aring1-1/+0
This patch removes a list_first_entry() call which is already done by the previous con_next_wq() call. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: filter user dlm messages for kernel locksAlexander Aring1-0/+9
This patch fixes the following crash by receiving a invalid message: [ 160.672220] ================================================================== [ 160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370 [ 160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319 [ 160.681447] [ 160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399 [ 160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014 [ 160.685574] Workqueue: dlm_recv process_recv_sockets [ 160.686721] Call Trace: [ 160.687310] dump_stack_lvl+0x56/0x6f [ 160.688169] ? dlm_user_add_ast+0xc3/0x370 [ 160.689116] kasan_report.cold.14+0x116/0x11b [ 160.690138] ? dlm_user_add_ast+0xc3/0x370 [ 160.690832] dlm_user_add_ast+0xc3/0x370 [ 160.691502] _receive_unlock_reply+0x103/0x170 [ 160.692241] _receive_message+0x11df/0x1ec0 [ 160.692926] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 160.693700] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 160.694427] ? lock_acquire+0x175/0x400 [ 160.695058] ? do_purge.isra.51+0x200/0x200 [ 160.695744] ? lock_acquired+0x360/0x5d0 [ 160.696400] ? lock_contended+0x6a0/0x6a0 [ 160.697055] ? lock_release+0x21d/0x5e0 [ 160.697686] ? lock_is_held_type+0xe0/0x110 [ 160.698352] ? lock_is_held_type+0xe0/0x110 [ 160.699026] ? ___might_sleep+0x1cc/0x1e0 [ 160.699698] ? dlm_wait_requestqueue+0x94/0x140 [ 160.700451] ? dlm_process_requestqueue+0x240/0x240 [ 160.701249] ? down_write_killable+0x2b0/0x2b0 [ 160.701988] ? do_raw_spin_unlock+0xa2/0x130 [ 160.702690] dlm_receive_buffer+0x1a5/0x210 [ 160.703385] dlm_process_incoming_buffer+0x726/0x9f0 [ 160.704210] receive_from_sock+0x1c0/0x3b0 [ 160.704886] ? dlm_tcp_shutdown+0x30/0x30 [ 160.705561] ? lock_acquire+0x175/0x400 [ 160.706197] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 160.706941] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 160.707681] process_recv_sockets+0x32/0x40 [ 160.708366] process_one_work+0x55e/0xad0 [ 160.709045] ? pwq_dec_nr_in_flight+0x110/0x110 [ 160.709820] worker_thread+0x65/0x5e0 [ 160.710423] ? process_one_work+0xad0/0xad0 [ 160.711087] kthread+0x1ed/0x220 [ 160.711628] ? set_kthread_struct+0x80/0x80 [ 160.712314] ret_from_fork+0x22/0x30 The issue is that we received a DLM message for a user lock but the destination lock is a kernel lock. Note that the address which is trying to derefence is 00000000deadbeef, which is in a kernel lock lkb->lkb_astparam, this field should never be derefenced by the DLM kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua (memory is shared by a union field). The struct lkb_ua will be handled by the DLM kernel stack but on a kernel lock it will contain invalid data and ends in most likely crashing the kernel. It can be reproduced with two cluster nodes. node 2: dlm_tool join test echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters node 1: dlm_tool join test python: foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \ m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE) newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb") newFile.write(bytes(foo)) Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: add lkb waiters debugfs functionalityAlexander Aring3-1/+43
This patch adds functionality to put a lkb to the waiters state. It can be useful to combine this feature with the "rawmsg" debugfs functionality. It will bring the DLM lkb into a state that a message will be parsed by the kernel. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: add lkb debugfs functionalityAlexander Aring3-1/+79
This patch adds functionality to add an lkb during runtime. This is a highly debugging feature only, wrong input can crash the kernel. It is a early state feature as well. The goal is to provide a user interface for manipulate dlm state and combine it with the rawmsg feature. It is debugfs functionality, we don't care about UAPI breakage. Even it's possible to add lkb's/rsb's which could never be exists in such wat by using normal DLM operation. The user of this interface always need to think before using this feature, not every crash which happens can really occur during normal dlm operation. Future there should be more functionality to add a more realistic lkb which reflects normal DLM state inside the kernel. For now this is enough. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: allow create lkb with specific id rangeAlexander Aring1-2/+8
This patch adds functionality to add a lkb with a specific id range. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: add debugfs rawmsg send functionalityAlexander Aring3-0/+87
This patch adds a dlm functionality to send a raw dlm message to a specific cluster node. This raw message can be build by user space and send out by writing the message to "rawmsg" dlm debugfs file. There is a in progress scapy dlm module which provides a easy build of DLM messages in user space. For example: DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...) The goal is to provide an easy reproducable state to crash DLM or to fuzz the DLM kernel stack if there are possible ways to crash it. Note: that if the sequence number is zero and dlm version is not set to 3.1 the kernel will automatic will set a right sequence number, otherwise DLM stack testing is not possible. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: let handle callback data as voidAlexander Aring3-13/+14
This patch changes the dlm_lowcomms_new_msg() function pointer private data from "struct mhandle *" to "void *" to provide different structures than just "struct mhandle". Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: ls_count busy wait to event based waitAlexander Aring3-18/+20
This patch changes the ls_count busy wait to use atomic counter values and wait_event() to wait until ls_count reach zero. It will slightly reduce the number of holding lslist_lock. At remove lockspace we need to retry the wait because it a lockspace get could interefere between wait_event() and holding the lock which deletes the lockspace list entry. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: requestqueue busy wait to event based waitAlexander Aring3-8/+11
This patch changes the requestqueue busy waiting algorithm to use atomic counter values and wait_event() to wait until the requestqueue is empty. It will slightly reduce the number of holding ls_requestqueue_mutex mutex. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: trace socket handlingAlexander Aring1-0/+4
This patch adds tracepoints for dlm socket receive and send functionality. We can use it to track how much data was send or received to or from a specific nodeid. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: initial support for tracepointsAlexander Aring3-0/+17
This patch adds initial support for dlm tracepoints. It will introduce tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their complete ast() callback or blocking bast() callback. The lock/unlock functionality has a start and end tracepoint, this is because there exists a race in case if would have a tracepoint at the end position only the complete/blocking callbacks could occur before. To work with eBPF tracing and using their lookup hash functionality there could be problems that an entry was not inserted yet. However use the start functionality for hash insert and check again in end functionality if there was an dlm internal error so there is no ast callback. In further it might also that locks with local masters will occur those callbacks immediately so we must have such functionality. I did not make everything accessible yet, although it seems eBPF can be used to access a lot of internal datastructures if it's aware of the struct definitions of the running kernel instance. We still can change it, if you do eBPF experiments e.g. time measurements between lock and callback functionality you can simple use the local lkb_id field as hash value in combination with the lockspace id if you have multiple lockspaces. Otherwise you can simple use trace-cmd for some functionality, e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: make dlm_callback_resume quiteAlexander Aring1-3/+5
This patch makes dlm_callback_resume info printout less noisy by accumulate all callback queues into one printout not in 25 times steps. It seems this printout became lately quite noisy in relationship with gfs2. Before: [241767.849302] dlm: bin: dlm_callback_resume 25 [241767.854846] dlm: bin: dlm_callback_resume 25 [241767.860373] dlm: bin: dlm_callback_resume 25 ... [241767.865920] dlm: bin: dlm_callback_resume 25 [241767.871352] dlm: bin: dlm_callback_resume 25 [241767.876733] dlm: bin: dlm_callback_resume 25 After the patch: [ 385.485728] dlm: gfs2: dlm_callback_resume 175 if zero it will not be printed out. Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: use dlm_recovery_stopped in conditionAlexander Aring3-6/+3
This patch will change to evaluate the dlm_recovery_stopped() in the condition of the if branch instead fetch it before evaluating the condition. As this is an atomic test-set operation it should be evaluated in the condition itself. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: use dlm_recovery_stopped instead of test_bitAlexander Aring1-1/+1
This patch will change to use dlm_recovery_stopped() which is the dlm way to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper. It is an atomic operation but the check is still as before to fetch the value if ls_recover_lock is held. There might be more further investigations if the value can be changed afterwards and if it has any side effects. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: move version conversion to compile timeAlexander Aring1-3/+3
This patch moves version conversion to little endian from a runtime variable to compile time constant. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: remove check SCTP is loaded messageAlexander Aring1-1/+1
Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try load the sctp module before we try to create a sctp kernel socket. That a socket creation fails now has more likely other reasons. This patch removes the part of error to load the sctp module and instead printout the error code. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: debug improvements print nodeidAlexander Aring1-2/+2
This patch improves the debug output for midcomms layer by also printing out the nodeid where users counter belongs to. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: fix small lockspace typoAlexander Aring1-1/+1
This patch fixes a typo from lockspace to lockspace. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02fs: dlm: remove obsolete INBUF defineAlexander Aring1-6/+0
This patch removes an obsolete define for some length for an temporary buffer which is not being used anymore. The use of this define is not necessary anymore since commit 4798cbbfbd00 ("fs: dlm: rework receive handling"). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-09-01fs: dlm: avoid comms shutdown delay in release_lockspaceAlexander Aring1-0/+1
When dlm_release_lockspace does active shutdown on connections to other nodes, the active shutdown will wait for any exisitng passive shutdowns to be resolved. But, the sequence of operations during dlm_release_lockspace can prevent the normal resolution of passive shutdowns (processed normally by way of lockspace recovery.) This disruption of passive shutdown handling can cause the active shutdown to wait for a full timeout period, delaying the completion of dlm_release_lockspace. To fix this, make dlm_release_lockspace resolve existing passive shutdowns (by calling dlm_clear_members earlier), before it does active shutdowns. The active shutdowns will not find any passive shutdowns to wait for, and will not be delayed. Reported-by: Chris Mackowski <cmackows@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-08-19fs: dlm: fix return -EINTR on recovery stoppedAlexander Aring3-3/+9
This patch will return -EINTR instead of 1 if recovery is stopped. In case of ping_members() the return value will be checked if the error is -EINTR for signaling another recovery was triggered and the whole recovery process will come to a clean end to process the next one. Returning 1 will abort the recovery process and can leave the recovery in a broken state. It was reported with the following kernel log message attached and a gfs2 mount stopped working: "dlm: bobvirt1: dlm_recover_members error 1" whereas 1 was returned because of a conversion of "dlm_recovery_stopped()" to an errno was missing which this patch will introduce. While on it all other possible missing errno conversions at other places were added as they are done as in other places. It might be worth to check the error case at this recovery level, because some of the functionality also returns -ENOBUFS and check why recovery ends in a broken state. However this will fix the issue if another recovery was triggered at some points of recovery handling. Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2021-08-19fs: dlm: implement delayed ack handlingAlexander Aring3-8/+50
This patch changes that we don't ack each message. Lowcomms will take care about to send an ack back after a bulk of messages was processed. Currently it's only when the whole receive buffer was processed, there might better positions to send an ack back but only the lowcomms implementation know when there are more data to receive. This patch has also disadvantages that we might retransmit more on errors, however this is a very rare case. Tested with make_panic on gfs2 with three nodes by running: trace-cmd record -p function -l 'dlm_send_ack' sleep 100 and trace-cmd report | wc -l Before patch: - 20548 - 21376 - 21398 After patch: - 18338 - 20679 - 19949 Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>