summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorNeilBrown <neilb@suse.de>2020-06-02 07:48:18 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-02 20:59:08 +0300
commita37b0715ddf3007734c4e2424c14bc7efcdd1190 (patch)
treee3b7e98d2a29d4ba32a91940422ae14f622619f1 /include
parent28659cc8cc8766707b1b50906df38bd94dcf349b (diff)
downloadlinux-a37b0715ddf3007734c4e2424c14bc7efcdd1190.tar.xz
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi). The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write. This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable. The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi. This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock. In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag. This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi. So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded. Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime. [akpm@linux-foundation.org: coding style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd] Cc: Christoph Hellwig <hch@lst.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r--include/linux/sched.h3
1 files changed, 2 insertions, 1 deletions
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4418f5cb8324..12ef0c753284 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1481,7 +1481,8 @@ extern struct pid *cad_pid;
#define PF_KSWAPD 0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */
-#define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
+#define PF_LOCAL_THROTTLE 0x00100000 /* Throttle writes only against the bdi I write to,
+ * I am cleaning dirty pages from some other bdi. */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */