From 3a5be9b8f43346a24f31c0017cb2566a6b2c72c5 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 3 Feb 2020 11:57:08 +0100 Subject: intel_idle: Introduce 'use_acpi' module parameter For diagnostics, it is generally useful to be able to make intel_idle take the system's ACPI tables into consideration even if that is not required for the processor model in there, so introduce a new module parameter, 'use_acpi', to make that happen and update the documentation to cover it. While at it, fix the 'no_acpi' module parameter name in the documentation. Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/intel_idle.rst | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst index afbf778035f8..8998598746a4 100644 --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -60,6 +60,9 @@ of the system. The former are always used if the processor model at hand is recognized by ``intel_idle`` and the latter are used if that is required for the given processor model (which is the case for all server processor models recognized by ``intel_idle``) or if the processor model is not recognized. +[There is a module parameter that can be used to make the driver use the ACPI +tables with any processor model recognized by it; see +`below `_.] If the ACPI tables are going to be used for building the list of available idle states, ``intel_idle`` first looks for a ``_CST`` object under one of the ACPI @@ -165,7 +168,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the ``MWAIT`` instruction is not allowed to be used, so the initialization of ``intel_idle`` will fail. -Apart from that there are two module parameters recognized by ``intel_idle`` +Apart from that there are three module parameters recognized by ``intel_idle`` itself that can be set via the kernel command line (they cannot be updated via sysfs, so that is the only way to change their values). @@ -186,9 +189,11 @@ QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states even if they have been enumerated (see :ref:`cpu-pm-qos` in :doc:`cpuidle`). Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail. -The ``noacpi`` module parameter (which is recognized by ``intel_idle`` if the -kernel has been configured with ACPI support), can be set to make the driver -ignore the system's ACPI tables entirely (it is unset by default). +The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle`` +if the kernel has been configured with ACPI support) can be set to make the +driver ignore the system's ACPI tables entirely or use them for all of the +recognized processor models, respectively (they both are unset by default and +``use_acpi`` has no effect if ``no_acpi`` is set). .. _intel-idle-core-and-package-idle-states: -- cgit v1.2.3 From 4dcb78ee579cdf90e30c5a0223f6f160ea37182d Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 3 Feb 2020 11:57:18 +0100 Subject: intel_idle: Introduce 'states_off' module parameter In certain system configurations it may not be desirable to use some C-states assumed to be available by intel_idle and the driver needs to be prevented from using them even before the cpuidle sysfs interface becomes accessible to user space. Currently, the only way to achieve that is by setting the 'max_cstate' module parameter to a value lower than the index of the shallowest of the C-states in question, but that may be overly intrusive, because it effectively makes all of the idle states deeper than the 'max_cstate' one go away (and the C-state to avoid may be in the middle of the range normally regarded as available). To allow that limitation to be overcome, introduce a new module parameter called 'states_off' to represent a list of idle states to be disabled by default in the form of a bitmask and update the documentation to cover it. Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/intel_idle.rst | 19 ++++++++++++++++++- drivers/idle/intel_idle.c | 23 ++++++++++++++++++++--- 2 files changed, 38 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst index 8998598746a4..89309e1b0e48 100644 --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -168,7 +168,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the ``MWAIT`` instruction is not allowed to be used, so the initialization of ``intel_idle`` will fail. -Apart from that there are three module parameters recognized by ``intel_idle`` +Apart from that there are four module parameters recognized by ``intel_idle`` itself that can be set via the kernel command line (they cannot be updated via sysfs, so that is the only way to change their values). @@ -195,6 +195,23 @@ driver ignore the system's ACPI tables entirely or use them for all of the recognized processor models, respectively (they both are unset by default and ``use_acpi`` has no effect if ``no_acpi`` is set). +The value of the ``states_off`` module parameter (0 by default) represents a +list of idle states to be disabled by default in the form of a bitmask. + +Namely, the positions of the bits that are set in the ``states_off`` value are +the indices of idle states to be disabled by default (as reflected by the names +of the corresponding idle state directories in ``sysfs``, :file:`state0`, +:file:`state1` ... :file:`state` ..., where ```` is the index of the given +idle state; see :ref:`idle-states-representation` in :doc:`cpuidle`). + +For example, if ``states_off`` is equal to 3, the driver will disable idle +states 0 and 1 by default, and if it is equal to 8, idle state 3 will be +disabled by default and so on (bit positions beyond the maximum idle state index +are ignored). + +The idle states disabled this way can be enabled (on a per-CPU basis) from user +space via ``sysfs``. + .. _intel-idle-core-and-package-idle-states: diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 6fbd94f85fa5..d55606608ac8 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -63,6 +63,7 @@ static struct cpuidle_driver intel_idle_driver = { }; /* intel_idle.max_cstate=0 disables driver */ static int max_cstate = CPUIDLE_STATE_MAX - 1; +static unsigned int disabled_states_mask; static unsigned int mwait_substates; @@ -1234,6 +1235,9 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv) if (cx->type > ACPI_STATE_C2) state->flags |= CPUIDLE_FLAG_TLB_FLUSHED; + if (disabled_states_mask & BIT(cstate)) + state->flags |= CPUIDLE_FLAG_OFF; + state->enter = intel_idle; state->enter_s2idle = intel_idle_s2idle; } @@ -1466,9 +1470,10 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) /* Structure copy. */ drv->states[drv->state_count] = cpuidle_state_table[cstate]; - if ((icpu->use_acpi || force_use_acpi) && - intel_idle_off_by_default(mwait_hint) && - !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE)) + if ((disabled_states_mask & BIT(drv->state_count)) || + ((icpu->use_acpi || force_use_acpi) && + intel_idle_off_by_default(mwait_hint) && + !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE))) drv->states[drv->state_count].flags |= CPUIDLE_FLAG_OFF; drv->state_count++; @@ -1487,6 +1492,10 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) static void __init intel_idle_cpuidle_driver_init(struct cpuidle_driver *drv) { cpuidle_poll_state_init(drv); + + if (disabled_states_mask & BIT(0)) + drv->states[0].flags |= CPUIDLE_FLAG_OFF; + drv->state_count = 1; if (icpu) @@ -1667,3 +1676,11 @@ device_initcall(intel_idle_init); * is the easiest way (currently) to continue doing that. */ module_param(max_cstate, int, 0444); +/* + * The positions of the bits that are set in this number are the indices of the + * idle states to be disabled by default (as reflected by the names of the + * corresponding idle state directories in sysfs, "state0", "state1" ... + * "state" ..., where is the index of the given state). + */ +module_param_named(states_off, disabled_states_mask, uint, 0444); +MODULE_PARM_DESC(states_off, "Mask of disabled idle states"); -- cgit v1.2.3 From c21502efdaedfdf9fc71334883a164341881bc22 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 31 Jan 2020 11:05:17 +0100 Subject: Documentation: admin-guide: PM: Update sleep states documentation There is some information in Documentation/power/interface.rst that is still missing from Documentation/admin-guide/pm/sleep-states.rst and really should be present in there, so update the latter by adding that information to it and delete the former (as it becomes redundant after that and it is somewhat outdated). While at it, clean up some assorted pieces of sleep-states.rst a bit. Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/sleep-states.rst | 76 ++++++++++++++++++++------ Documentation/power/interface.rst | 79 --------------------------- 2 files changed, 59 insertions(+), 96 deletions(-) delete mode 100644 Documentation/power/interface.rst (limited to 'Documentation') diff --git a/Documentation/admin-guide/pm/sleep-states.rst b/Documentation/admin-guide/pm/sleep-states.rst index cd3a28cb81f4..ee55a460c639 100644 --- a/Documentation/admin-guide/pm/sleep-states.rst +++ b/Documentation/admin-guide/pm/sleep-states.rst @@ -153,8 +153,11 @@ for the given CPU architecture includes the low-level code for system resume. Basic ``sysfs`` Interfaces for System Suspend and Hibernation ============================================================= -The following files located in the :file:`/sys/power/` directory can be used by -user space for sleep states control. +The power management subsystem provides userspace with a unified ``sysfs`` +interface for system sleep regardless of the underlying system architecture or +platform. That interface is located in the :file:`/sys/power/` directory +(assuming that ``sysfs`` is mounted at :file:`/sys`) and it consists of the +following attributes (files): ``state`` This file contains a list of strings representing sleep states supported @@ -162,9 +165,9 @@ user space for sleep states control. to start a transition of the system into the sleep state represented by that string. - In particular, the strings "disk", "freeze" and "standby" represent the + In particular, the "disk", "freeze" and "standby" strings represent the :ref:`hibernation `, :ref:`suspend-to-idle ` and - :ref:`standby ` sleep states, respectively. The string "mem" + :ref:`standby ` sleep states, respectively. The "mem" string is interpreted in accordance with the contents of the ``mem_sleep`` file described below. @@ -177,7 +180,7 @@ user space for sleep states control. associated with the "mem" string in the ``state`` file described above. The strings that may be present in this file are "s2idle", "shallow" - and "deep". The string "s2idle" always represents :ref:`suspend-to-idle + and "deep". The "s2idle" string always represents :ref:`suspend-to-idle ` and, by convention, "shallow" and "deep" represent :ref:`standby ` and :ref:`suspend-to-RAM `, respectively. @@ -185,15 +188,17 @@ user space for sleep states control. Writing one of the listed strings into this file causes the system suspend variant represented by it to be associated with the "mem" string in the ``state`` file. The string representing the suspend variant - currently associated with the "mem" string in the ``state`` file - is listed in square brackets. + currently associated with the "mem" string in the ``state`` file is + shown in square brackets. If the kernel does not support system suspend, this file is not present. ``disk`` - This file contains a list of strings representing different operations - that can be carried out after the hibernation image has been saved. The - possible options are as follows: + This file controls the operating mode of hibernation (Suspend-to-Disk). + Specifically, it tells the kernel what to do after creating a + hibernation image. + + Reading from it returns a list of supported options encoded as: ``platform`` Put the system into a special low-power state (e.g. ACPI S4) to @@ -201,6 +206,11 @@ user space for sleep states control. platform firmware to take a simplified initialization path after wakeup. + It is only available if the platform provides a special + mechanism to put the system to sleep after creating a + hibernation image (platforms with ACPI do that as a rule, for + example). + ``shutdown`` Power off the system. @@ -214,22 +224,53 @@ user space for sleep states control. the hibernation image and continue. Otherwise, use the image to restore the previous state of the system. + It is available if system suspend is supported. + ``test_resume`` Diagnostic operation. Load the image as though the system had just woken up from hibernation and the currently running kernel instance was a restore kernel and follow up with full system resume. - Writing one of the listed strings into this file causes the option + Writing one of the strings listed above into this file causes the option represented by it to be selected. - The currently selected option is shown in square brackets which means + The currently selected option is shown in square brackets, which means that the operation represented by it will be carried out after creating - and saving the image next time hibernation is triggered by writing - ``disk`` to :file:`/sys/power/state`. + and saving the image when hibernation is triggered by writing ``disk`` + to :file:`/sys/power/state`. If the kernel does not support hibernation, this file is not present. +``image_size`` + This file controls the size of hibernation images. + + It can be written a string representing a non-negative integer that will + be used as a best-effort upper limit of the image size, in bytes. The + hibernation core will do its best to ensure that the image size will not + exceed that number, but if that turns out to be impossible to achieve, a + hibernation image will still be created and its size will be as small as + possible. In particular, writing '0' to this file causes the size of + hibernation images to be minimum. + + Reading from it returns the current image size limit, which is set to + around 2/5 of the available RAM size by default. + +``pm_trace`` + This file controls the "PM trace" mechanism saving the last suspend + or resume event point in the RTC memory across reboots. It helps to + debug hard lockups or reboots due to device driver failures that occur + during system suspend or resume (which is more common) more effectively. + + If it contains "1", the fingerprint of each suspend/resume event point + in turn will be stored in the RTC memory (overwriting the actual RTC + information), so it will survive a system crash if one occurs right + after storing it and it can be used later to identify the driver that + caused the crash to happen. + + It contains "0" by default, which may be changed to "1" by writing a + string representing a nonzero integer into it. + According to the above, there are two ways to make the system go into the :ref:`suspend-to-idle ` state. The first one is to write "freeze" directly to :file:`/sys/power/state`. The second one is to write "s2idle" to @@ -244,6 +285,7 @@ system go into the :ref:`suspend-to-RAM ` state (write "deep" into The default suspend variant (ie. the one to be used without writing anything into :file:`/sys/power/mem_sleep`) is either "deep" (on the majority of systems supporting :ref:`suspend-to-RAM `) or "s2idle", but it can be overridden -by the value of the "mem_sleep_default" parameter in the kernel command line. -On some ACPI-based systems, depending on the information in the ACPI tables, the -default may be "s2idle" even if :ref:`suspend-to-RAM ` is supported. +by the value of the ``mem_sleep_default`` parameter in the kernel command line. +On some systems with ACPI, depending on the information in the ACPI tables, the +default may be "s2idle" even if :ref:`suspend-to-RAM ` is supported in +principle. diff --git a/Documentation/power/interface.rst b/Documentation/power/interface.rst deleted file mode 100644 index 8d270ed27228..000000000000 --- a/Documentation/power/interface.rst +++ /dev/null @@ -1,79 +0,0 @@ -=========================================== -Power Management Interface for System Sleep -=========================================== - -Copyright (c) 2016 Intel Corp., Rafael J. Wysocki - -The power management subsystem provides userspace with a unified sysfs interface -for system sleep regardless of the underlying system architecture or platform. -The interface is located in the /sys/power/ directory (assuming that sysfs is -mounted at /sys). - -/sys/power/state is the system sleep state control file. - -Reading from it returns a list of supported sleep states, encoded as: - -- 'freeze' (Suspend-to-Idle) -- 'standby' (Power-On Suspend) -- 'mem' (Suspend-to-RAM) -- 'disk' (Suspend-to-Disk) - -Suspend-to-Idle is always supported. Suspend-to-Disk is always supported -too as long the kernel has been configured to support hibernation at all -(ie. CONFIG_HIBERNATION is set in the kernel configuration file). Support -for Suspend-to-RAM and Power-On Suspend depends on the capabilities of the -platform. - -If one of the strings listed in /sys/power/state is written to it, the system -will attempt to transition into the corresponding sleep state. Refer to -Documentation/admin-guide/pm/sleep-states.rst for a description of each of -those states. - -/sys/power/disk controls the operating mode of hibernation (Suspend-to-Disk). -Specifically, it tells the kernel what to do after creating a hibernation image. - -Reading from it returns a list of supported options encoded as: - -- 'platform' (put the system into sleep using a platform-provided method) -- 'shutdown' (shut the system down) -- 'reboot' (reboot the system) -- 'suspend' (trigger a Suspend-to-RAM transition) -- 'test_resume' (resume-after-hibernation test mode) - -The currently selected option is printed in square brackets. - -The 'platform' option is only available if the platform provides a special -mechanism to put the system to sleep after creating a hibernation image (ACPI -does that, for example). The 'suspend' option is available if Suspend-to-RAM -is supported. Refer to Documentation/power/basic-pm-debugging.rst for the -description of the 'test_resume' option. - -To select an option, write the string representing it to /sys/power/disk. - -/sys/power/image_size controls the size of hibernation images. - -It can be written a string representing a non-negative integer that will be -used as a best-effort upper limit of the image size, in bytes. The hibernation -core will do its best to ensure that the image size will not exceed that number. -However, if that turns out to be impossible to achieve, a hibernation image will -still be created and its size will be as small as possible. In particular, -writing '0' to this file will enforce hibernation images to be as small as -possible. - -Reading from this file returns the current image size limit, which is set to -around 2/5 of available RAM by default. - -/sys/power/pm_trace controls the PM trace mechanism saving the last suspend -or resume event point in the RTC across reboots. - -It helps to debug hard lockups or reboots due to device driver failures that -occur during system suspend or resume (which is more common) more effectively. - -If /sys/power/pm_trace contains '1', the fingerprint of each suspend/resume -event point in turn will be stored in the RTC memory (overwriting the actual -RTC information), so it will survive a system crash if one occurs right after -storing it and it can be used later to identify the driver that caused the crash -to happen (see Documentation/power/s2ram.rst for more information). - -Initially it contains '0' which may be changed to '1' by writing a string -representing a nonzero integer into it. -- cgit v1.2.3 From f06572ef476d368a239f0238ecf7b00b9cdbf5bf Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 5 Feb 2020 02:08:31 +0100 Subject: cpuidle: Documentation: Clean up PM QoS description Clean up the language in one paragraph in the PM QoS description in Documentation/admin-guide/pm/cpuidle.rst. Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/cpuidle.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst index 311cd7cc2b75..6a06dc473dd6 100644 --- a/Documentation/admin-guide/pm/cpuidle.rst +++ b/Documentation/admin-guide/pm/cpuidle.rst @@ -632,16 +632,16 @@ class priority list and destroyed. If that happens, the priority list mechanism will be used, again, to determine the new effective value for the whole list and that value will become the new real constraint. -In turn, for each CPU there is only one resume latency PM QoS request -associated with the :file:`power/pm_qos_resume_latency_us` file under +In turn, for each CPU there is one resume latency PM QoS request associated with +the :file:`power/pm_qos_resume_latency_us` file under :file:`/sys/devices/system/cpu/cpu/` in ``sysfs`` and writing to it causes this single PM QoS request to be updated regardless of which user space process does that. In other words, this PM QoS request is shared by the entire user space, so access to the file associated with it needs to be arbitrated to avoid confusion. [Arguably, the only legitimate use of this mechanism in practice is to pin a process to the CPU in question and let it use the -``sysfs`` interface to control the resume latency constraint for it.] It -still only is a request, however. It is a member of a priority list used to +``sysfs`` interface to control the resume latency constraint for it.] It is +still only a request, however. It is an entry in a priority list used to determine the effective value to be set as the resume latency constraint for the CPU in question every time the list of requests is updated this way or another (there may be other requests coming from kernel code in that list). -- cgit v1.2.3