From efdf7532bd3d302a96436beee153364f26a1ddae Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 5 Sep 2023 09:32:42 -0400 Subject: cgroup/cpuset: Documentation update for partition This patch updates the cgroup-v2.rst file to include information about the new "cpuset.cpus.exclusive" and "cpuset.cpus.excluisve.effective" control files as well as the new remote partition type. Signed-off-by: Waiman Long Signed-off-by: Tejun Heo --- Documentation/admin-guide/cgroup-v2.rst | 123 +++++++++++++++++++++++--------- 1 file changed, 91 insertions(+), 32 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b26b5274eaaf..e40b8560e002 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2226,6 +2226,49 @@ Cpuset Interface Files Its value will be affected by memory nodes hotplug events. + cpuset.cpus.exclusive + A read-write multiple values file which exists on non-root + cpuset-enabled cgroups. + + It lists all the exclusive CPUs that are allowed to be used + to create a new cpuset partition. Its value is not used + unless the cgroup becomes a valid partition root. See the + "cpuset.cpus.partition" section below for a description of what + a cpuset partition is. + + When the cgroup becomes a partition root, the actual exclusive + CPUs that are allocated to that partition are listed in + "cpuset.cpus.exclusive.effective" which may be different + from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive" + has previously been set, "cpuset.cpus.exclusive.effective" + is always a subset of it. + + Users can manually set it to a value that is different from + "cpuset.cpus". The only constraint in setting it is that the + list of CPUs must be exclusive with respect to its sibling. + + For a parent cgroup, any one of its exclusive CPUs can only + be distributed to at most one of its child cgroups. Having an + exclusive CPU appearing in two or more of its child cgroups is + not allowed (the exclusivity rule). A value that violates the + exclusivity rule will be rejected with a write error. + + The root cgroup is a partition root and all its available CPUs + are in its exclusive CPU set. + + cpuset.cpus.exclusive.effective + A read-only multiple values file which exists on all non-root + cpuset-enabled cgroups. + + This file shows the effective set of exclusive CPUs that + can be used to create a partition root. The content of this + file will always be a subset of "cpuset.cpus" and its parent's + "cpuset.cpus.exclusive.effective" if its parent is not the root + cgroup. It will also be a subset of "cpuset.cpus.exclusive" + if it is set. If "cpuset.cpus.exclusive" is not set, it is + treated to have an implicit value of "cpuset.cpus" in the + formation of local partition. + cpuset.cpus.partition A read-write single value file which exists on non-root cpuset-enabled cgroups. This flag is owned by the parent cgroup @@ -2239,26 +2282,41 @@ Cpuset Interface Files "isolated" Partition root without load balancing ========== ===================================== - The root cgroup is always a partition root and its state - cannot be changed. All other non-root cgroups start out as - "member". + A cpuset partition is a collection of cpuset-enabled cgroups with + a partition root at the top of the hierarchy and its descendants + except those that are separate partition roots themselves and + their descendants. A partition has exclusive access to the + set of exclusive CPUs allocated to it. Other cgroups outside + of that partition cannot use any CPUs in that set. + + There are two types of partitions - local and remote. A local + partition is one whose parent cgroup is also a valid partition + root. A remote partition is one whose parent cgroup is not a + valid partition root itself. Writing to "cpuset.cpus.exclusive" + is optional for the creation of a local partition as its + "cpuset.cpus.exclusive" file will assume an implicit value that + is the same as "cpuset.cpus" if it is not set. Writing the + proper "cpuset.cpus.exclusive" values down the cgroup hierarchy + before the target partition root is mandatory for the creation + of a remote partition. + + Currently, a remote partition cannot be created under a local + partition. All the ancestors of a remote partition root except + the root cgroup cannot be a partition root. + + The root cgroup is always a partition root and its state cannot + be changed. All other non-root cgroups start out as "member". When set to "root", the current cgroup is the root of a new - partition or scheduling domain that comprises itself and all - its descendants except those that are separate partition roots - themselves and their descendants. + partition or scheduling domain. The set of exclusive CPUs is + determined by the value of its "cpuset.cpus.exclusive.effective". - When set to "isolated", the CPUs in that partition root will + When set to "isolated", the CPUs in that partition will be in an isolated state without any load balancing from the scheduler. Tasks placed in such a partition with multiple CPUs should be carefully distributed and bound to each of the individual CPUs for optimal performance. - The value shown in "cpuset.cpus.effective" of a partition root - is the CPUs that the partition root can dedicate to a potential - new child partition root. The new child subtracts available - CPUs from its parent "cpuset.cpus.effective". - A partition root ("root" or "isolated") can be in one of the two possible states - valid or invalid. An invalid partition root is in a degraded state where some state information may @@ -2281,37 +2339,33 @@ Cpuset Interface Files In the case of an invalid partition root, a descriptive string on why the partition is invalid is included within parentheses. - For a partition root to become valid, the following conditions + For a local partition root to be valid, the following conditions must be met. - 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they - are not shared by any of its siblings (exclusivity rule). - 2) The parent cgroup is a valid partition root. - 3) The "cpuset.cpus" is not empty and must contain at least - one of the CPUs from parent's "cpuset.cpus", i.e. they overlap. - 4) The "cpuset.cpus.effective" cannot be empty unless there is + 1) The parent cgroup is a valid partition root. + 2) The "cpuset.cpus.exclusive.effective" file cannot be empty, + though it may contain offline CPUs. + 3) The "cpuset.cpus.effective" cannot be empty unless there is no task associated with this partition. - External events like hotplug or changes to "cpuset.cpus" can - cause a valid partition root to become invalid and vice versa. - Note that a task cannot be moved to a cgroup with empty - "cpuset.cpus.effective". + For a remote partition root to be valid, all the above conditions + except the first one must be met. - For a valid partition root with the sibling cpu exclusivity - rule enabled, changes made to "cpuset.cpus" that violate the - exclusivity rule will invalidate the partition as well as its - sibling partitions with conflicting cpuset.cpus values. So - care must be taking in changing "cpuset.cpus". + External events like hotplug or changes to "cpuset.cpus" or + "cpuset.cpus.exclusive" can cause a valid partition root to + become invalid and vice versa. Note that a task cannot be + moved to a cgroup with empty "cpuset.cpus.effective". A valid non-root parent partition may distribute out all its CPUs - to its child partitions when there is no task associated with it. + to its child local partitions when there is no task associated + with it. - Care must be taken to change a valid partition root to - "member" as all its child partitions, if present, will become + Care must be taken to change a valid partition root to "member" + as all its child local partitions, if present, will become invalid causing disruption to tasks running in those child partitions. These inactivated partitions could be recovered if their parent is switched back to a partition root with a proper - set of "cpuset.cpus". + value in "cpuset.cpus" or "cpuset.cpus.exclusive". Poll and inotify events are triggered whenever the state of "cpuset.cpus.partition" changes. That includes changes caused @@ -2321,6 +2375,11 @@ Cpuset Interface Files to "cpuset.cpus.partition" without the need to do continuous polling. + A user can pre-configure certain CPUs to an isolated state + with load balancing disabled at boot time with the "isolcpus" + kernel boot command line option. If those CPUs are to be put + into a partition, they have to be used in an isolated partition. + Device controller ----------------- -- cgit v1.2.3 From 9b81d3a5be05d350ac93d99762c7ee91fe29b4cb Mon Sep 17 00:00:00 2001 From: Luiz Capitulino Date: Wed, 27 Sep 2023 14:25:40 +0000 Subject: cgroup: add cgroup_favordynmods= command-line option MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We have a need of using favordynmods with cgroup v1, which doesn't support changing mount flags during remount. Enabling CONFIG_CGROUP_FAVOR_DYNMODS at build-time is not an option because we want to be able to selectively enable it for certain systems. This commit addresses this by introducing the cgroup_favordynmods= command-line option. This option works for both cgroup v1 and v2 and also allows for disabling favorynmods when the kernel built with CONFIG_CGROUP_FAVOR_DYNMODS=y. Also, note that when cgroup_favordynmods=true favordynmods is never disabled in cgroup_destroy_root(). Signed-off-by: Luiz Capitulino Reviewed-by: Michal Koutný Signed-off-by: Tejun Heo --- Documentation/admin-guide/kernel-parameters.txt | 4 ++++ kernel/cgroup/cgroup.c | 18 ++++++++++++++---- 2 files changed, 18 insertions(+), 4 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0a1731a0f0ef..8b744d39d393 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -580,6 +580,10 @@ named mounts. Specifying both "all" and "named" disables all v1 hierarchies. + cgroup_favordynmods= [KNL] Enable or Disable favordynmods. + Format: { "true" | "false" } + Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS. + cgroup.memory= [KNL] Pass options to the cgroup memory controller. Format: nosocket -- Disable socket memory accounting. diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 833ac6dd15d9..059cd5651d41 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -207,6 +207,8 @@ static u16 have_exit_callback __read_mostly; static u16 have_release_callback __read_mostly; static u16 have_canfork_callback __read_mostly; +static bool have_favordynmods __ro_after_init = IS_ENABLED(CONFIG_CGROUP_FAVOR_DYNMODS); + /* cgroup namespace for init task */ struct cgroup_namespace init_cgroup_ns = { .ns.count = REFCOUNT_INIT(2), @@ -1350,7 +1352,9 @@ static void cgroup_destroy_root(struct cgroup_root *root) cgroup_root_count--; } - cgroup_favor_dynmods(root, false); + if (!have_favordynmods) + cgroup_favor_dynmods(root, false); + cgroup_exit_root_id(root); cgroup_unlock(); @@ -2245,9 +2249,9 @@ static int cgroup_init_fs_context(struct fs_context *fc) fc->user_ns = get_user_ns(ctx->ns->user_ns); fc->global = true; -#ifdef CONFIG_CGROUP_FAVOR_DYNMODS - ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS; -#endif + if (have_favordynmods) + ctx->flags |= CGRP_ROOT_FAVOR_DYNMODS; + return 0; } @@ -6766,6 +6770,12 @@ static int __init enable_cgroup_debug(char *str) } __setup("cgroup_debug", enable_cgroup_debug); +static int __init cgroup_favordynmods_setup(char *str) +{ + return (kstrtobool(str, &have_favordynmods) == 0); +} +__setup("cgroup_favordynmods=", cgroup_favordynmods_setup); + /** * css_tryget_online_from_dir - get corresponding css from a cgroup dentry * @dentry: directory dentry of interest -- cgit v1.2.3 From a41796b5537dd90eed0e8a6341dec97f4507f5ed Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 17 Oct 2023 13:13:41 -0400 Subject: docs/cgroup: Add the list of threaded controllers to cgroup-v2.rst The cgroup-v2 file mentions the concept of threaded controllers which can be used in a threaded cgroup. However, it doesn't mention clearly which controllers are threaded leading to some confusion about what controller can be used requiring some experimentation. Clear this up by explicitly listing the controllers that can be used currently in a threaded cgroup. Signed-off-by: Waiman Long Signed-off-by: Tejun Heo --- Documentation/admin-guide/cgroup-v2.rst | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index e40b8560e002..e440aee4fe94 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -364,6 +364,13 @@ constraint, a threaded controller must be able to handle competition between threads in a non-leaf cgroup and its child cgroups. Each threaded controller defines how such competitions are handled. +Currently, the following controllers are threaded and can be enabled +in a threaded cgroup:: + +- cpu +- cpuset +- perf_event +- pids [Un]populated Notification -------------------------- -- cgit v1.2.3