summaryrefslogtreecommitdiff
path: root/Documentation/mm/overcommit-accounting.rst
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-08-06 02:32:45 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2022-08-06 02:32:45 +0300
commit6614a3c3164a5df2b54abb0b3559f51041cf705b (patch)
tree1c25c23d9efed988705287fc2ccb78e0e76e311d /Documentation/mm/overcommit-accounting.rst
parent74cae210a335d159f2eb822e261adee905b6951a (diff)
parent360614c01f81f48a89d8b13f8fa69c3ae0a1f5c7 (diff)
downloadlinux-6614a3c3164a5df2b54abb0b3559f51041cf705b.tar.xz
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
Diffstat (limited to 'Documentation/mm/overcommit-accounting.rst')
-rw-r--r--Documentation/mm/overcommit-accounting.rst86
1 files changed, 86 insertions, 0 deletions
diff --git a/Documentation/mm/overcommit-accounting.rst b/Documentation/mm/overcommit-accounting.rst
new file mode 100644
index 000000000000..a4895d6fc1c2
--- /dev/null
+++ b/Documentation/mm/overcommit-accounting.rst
@@ -0,0 +1,86 @@
+=====================
+Overcommit Accounting
+=====================
+
+The Linux kernel supports the following overcommit handling modes
+
+0
+ Heuristic overcommit handling. Obvious overcommits of address
+ space are refused. Used for a typical system. It ensures a
+ seriously wild allocation fails while allowing overcommit to
+ reduce swap usage. root is allowed to allocate slightly more
+ memory in this mode. This is the default.
+
+1
+ Always overcommit. Appropriate for some scientific
+ applications. Classic example is code using sparse arrays and
+ just relying on the virtual memory consisting almost entirely
+ of zero pages.
+
+2
+ Don't overcommit. The total address space commit for the
+ system is not permitted to exceed swap + a configurable amount
+ (default is 50%) of physical RAM. Depending on the amount you
+ use, in most situations this means a process will not be
+ killed while accessing pages but will receive errors on memory
+ allocation as appropriate.
+
+ Useful for applications that want to guarantee their memory
+ allocations will be available in the future without having to
+ initialize every page.
+
+The overcommit policy is set via the sysctl ``vm.overcommit_memory``.
+
+The overcommit amount can be set via ``vm.overcommit_ratio`` (percentage)
+or ``vm.overcommit_kbytes`` (absolute value). These only have an effect
+when ``vm.overcommit_memory`` is set to 2.
+
+The current overcommit limit and amount committed are viewable in
+``/proc/meminfo`` as CommitLimit and Committed_AS respectively.
+
+Gotchas
+=======
+
+The C language stack growth does an implicit mremap. If you want absolute
+guarantees and run close to the edge you MUST mmap your stack for the
+largest size you think you will need. For typical stack usage this does
+not matter much but it's a corner case if you really really care
+
+In mode 2 the MAP_NORESERVE flag is ignored.
+
+
+How It Works
+============
+
+The overcommit is based on the following rules
+
+For a file backed map
+ | SHARED or READ-only - 0 cost (the file is the map not swap)
+ | PRIVATE WRITABLE - size of mapping per instance
+
+For an anonymous or ``/dev/zero`` map
+ | SHARED - size of mapping
+ | PRIVATE READ-only - 0 cost (but of little use)
+ | PRIVATE WRITABLE - size of mapping per instance
+
+Additional accounting
+ | Pages made writable copies by mmap
+ | shmfs memory drawn from the same pool
+
+Status
+======
+
+* We account mmap memory mappings
+* We account mprotect changes in commit
+* We account mremap changes in size
+* We account brk
+* We account munmap
+* We report the commit status in /proc
+* Account and check on fork
+* Review stack handling/building on exec
+* SHMfs accounting
+* Implement actual limit enforcement
+
+To Do
+=====
+* Account ptrace pages (this is hard)