diff options
author | Kent Overstreet <kent.overstreet@gmail.com> | 2022-08-22 20:23:47 +0300 |
---|---|---|
committer | Kent Overstreet <kent.overstreet@linux.dev> | 2023-10-23 00:09:41 +0300 |
commit | 33bd5d068603f9e81e0b73dbe50e9b88b2e56d0d (patch) | |
tree | 6fff6e218b381e0fa2cd4580da3a2e919d18ccd8 /fs/bcachefs/debug.c | |
parent | 62448afee714354a26db8a0f3c644f58628f0792 (diff) | |
download | linux-33bd5d068603f9e81e0b73dbe50e9b88b2e56d0d.tar.xz |
bcachefs: Deadlock cycle detector
We've outgrown our own deadlock avoidance strategy.
The btree iterator API provides an interface where the user doesn't need
to concern themselves with lock ordering - different btree iterators can
be traversed in any order. Without special care, this will lead to
deadlocks.
Our previous strategy was to define a lock ordering internally, and
whenever we attempt to take a lock and trylock() fails, we'd check if
the current btree transaction is holding any locks that cause a lock
ordering violation. If so, we'd issue a transaction restart, and then
bch2_trans_begin() would re-traverse all previously used iterators, but
in the correct order.
That approach had some issues, though.
- Sometimes we'd issue transaction restarts unnecessarily, when no
deadlock would have actually occured. Lock ordering restarts have
become our primary cause of transaction restarts, on some workloads
totally 20% of actual transaction commits.
- To avoid deadlock or livelock, we'd often have to take intent locks
when we only wanted a read lock: with the lock ordering approach, it
is actually illegal to hold _any_ read lock while blocking on an intent
lock, and this has been causing us unnecessary lock contention.
- It was getting fragile - the various lock ordering rules are not
trivial, and we'd been seeing occasional livelock issues related to
this machinery.
So, since bcachefs is already a relational database masquerading as a
filesystem, we're stealing the next traditional database technique and
switching to a cycle detector for avoiding deadlocks.
When we block taking a btree lock, after adding ourself to the waitlist
but before sleeping, we do a DFS of btree transactions waiting on other
btree transactions, starting with the current transaction and walking
our held locks, and transactions blocking on our held locks.
If we find a cycle, we emit a transaction restart. Occasionally (e.g.
the btree split path) we can not allow the lock() operation to fail, so
if necessary we'll tell another transaction that it has to fail.
Result: trans_restart_would_deadlock events are reduced by a factor of
10 to 100, and we'll be able to delete a whole bunch of grotty, fragile
code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Diffstat (limited to 'fs/bcachefs/debug.c')
-rw-r--r-- | fs/bcachefs/debug.c | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/fs/bcachefs/debug.c b/fs/bcachefs/debug.c index 4fe20d36212e..6944dfef5bcb 100644 --- a/fs/bcachefs/debug.c +++ b/fs/bcachefs/debug.c @@ -534,7 +534,7 @@ static ssize_t bch2_btree_transactions_read(struct file *file, char __user *buf, mutex_lock(&c->btree_trans_lock); list_for_each_entry(trans, &c->btree_trans_list, list) { - if (trans->task->pid <= i->iter) + if (trans->locking_wait.task->pid <= i->iter) continue; ret = flush_buf(i); @@ -546,11 +546,11 @@ static ssize_t bch2_btree_transactions_read(struct file *file, char __user *buf, prt_printf(&i->buf, "backtrace:"); prt_newline(&i->buf); printbuf_indent_add(&i->buf, 2); - prt_backtrace(&i->buf, trans->task); + prt_backtrace(&i->buf, trans->locking_wait.task); printbuf_indent_sub(&i->buf, 2); prt_newline(&i->buf); - i->iter = trans->task->pid; + i->iter = trans->locking_wait.task->pid; } mutex_unlock(&c->btree_trans_lock); |