From 08810a4119aaebf6318f209ec5dd9828e969cba4 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 25 Oct 2017 14:12:29 +0200 Subject: PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags The motivation for this change is to provide a way to work around a problem with the direct-complete mechanism used for avoiding system suspend/resume handling for devices in runtime suspend. The problem is that some middle layer code (the PCI bus type and the ACPI PM domain in particular) returns positive values from its system suspend ->prepare callbacks regardless of whether the driver's ->prepare returns a positive value or 0, which effectively prevents drivers from being able to control the direct-complete feature. Some drivers need that control, however, and the PCI bus type has grown its own flag to deal with this issue, but since it is not limited to PCI, it is better to address it by adding driver flags at the core level. To that end, add a driver_flags field to struct dev_pm_info for flags that can be set by device drivers at the probe time to inform the PM core and/or bus types, PM domains and so on on the capabilities and/or preferences of device drivers. Also add two static inline helpers for setting that field and testing it against a given set of flags and make the driver core clear it automatically on driver remove and probe failures. Define and document two PM driver flags related to the direct- complete feature: NEVER_SKIP and SMART_PREPARE that can be used, respectively, to indicate to the PM core that the direct-complete mechanism should never be used for the device and to inform the middle layer code (bus types, PM domains etc) that it can only request the PM core to use the direct-complete mechanism for the device (by returning a positive value from its ->prepare callback) if it also has been requested by the driver. While at it, make the core check pm_runtime_suspended() when setting power.direct_complete so that it doesn't need to be checked by ->prepare callbacks. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas Reviewed-by: Ulf Hansson --- Documentation/driver-api/pm/devices.rst | 14 ++++++++++++++ Documentation/power/pci.txt | 19 +++++++++++++++++++ 2 files changed, 33 insertions(+) (limited to 'Documentation') diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index 4a18ef9997c0..8add5b302a89 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. is because all such devices are initially set to runtime-suspended with runtime PM disabled. + This feature also can be controlled by device drivers by using the + ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power + management flags. [Typically, they are set at the time the driver is + probed against the device in question by passing them to the + :c:func:`dev_pm_set_driver_flags` helper function.] If the first of + these flags is set, the PM core will not apply the direct-complete + procedure described above to the given device and, consequenty, to any + of its ancestors. The second flag, when set, informs the middle layer + code (bus types, device types, PM domains, classes) that it should take + the return value of the ``->prepare`` callback provided by the driver + into account and it may only return a positive value from its own + ``->prepare`` callback if the driver's one also has returned a positive + value. + 2. The ``->suspend`` methods should quiesce the device to stop it from performing I/O. They also may save the device registers and put it into the appropriate low-power state, depending on the bus type the device is diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index a1b7f7158930..ab4e7d0540c1 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the .suspend(), .freeze(), and .poweroff() members and one resume routine is to be pointed to by the .resume(), .thaw(), and .restore() members. +3.1.19. Driver Flags for Power Management + +The PM core allows device drivers to set flags that influence the handling of +power management for the devices by the core itself and by middle layer code +including the PCI bus type. The flags should be set once at the driver probe +time with the help of the dev_pm_set_driver_flags() function and they should not +be updated directly afterwards. + +The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete +mechanism allowing device suspend/resume callbacks to be skipped if the device +is in runtime suspend when the system suspend starts. That also affects all of +the ancestors of the device, so this flag should only be used if absolutely +necessary. + +The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a +positive value from pci_pm_prepare() if the ->prepare callback provided by the +driver of the device returns a positive value. That allows the driver to opt +out from using the direct-complete mechanism dynamically. + 3.2. Device Runtime Power Management ------------------------------------ In addition to providing device power management callbacks PCI device drivers -- cgit v1.2.3 From 0eab11c9ae3b3cc5dd76f20b81d0247647a6e96f Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 26 Oct 2017 12:12:08 +0200 Subject: PM / core: Add SMART_SUSPEND driver flag Define and document a SMART_SUSPEND flag to instruct bus types and PM domains that the system suspend callbacks provided by the driver can cope with runtime-suspended devices, so from the driver's perspective it should be safe to leave devices in runtime suspend during system suspend. Setting that flag may also cause middle-layer code (bus types, PM domains etc.) to skip invocations of the ->suspend_late and ->suspend_noirq callbacks provided by the driver if the device is in runtime suspend at the beginning of the "late" phase of the system-wide suspend transition, in which case the driver's system-wide resume callbacks may be invoked back-to-back with its ->runtime_suspend callback, so the driver has to be able to cope with that too. Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Reviewed-by: Ulf Hansson --- Documentation/driver-api/pm/devices.rst | 20 ++++++++++++++++++++ drivers/base/power/main.c | 3 +++ include/linux/pm.h | 8 ++++++++ 3 files changed, 31 insertions(+) (limited to 'Documentation') diff --git a/Documentation/driver-api/pm/devices.rst b/Documentation/driver-api/pm/devices.rst index 8add5b302a89..574dadd06dec 100644 --- a/Documentation/driver-api/pm/devices.rst +++ b/Documentation/driver-api/pm/devices.rst @@ -766,6 +766,26 @@ the state of devices (possibly except for resuming them from runtime suspend) from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* invoking device drivers' ``->suspend`` callbacks (or equivalent). +Some bus types and PM domains have a policy to resume all devices from runtime +suspend upfront in their ``->suspend`` callbacks, but that may not be really +necessary if the driver of the device can cope with runtime-suspended devices. +The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in +:c:member:`power.driver_flags` at the probe time, by passing it to the +:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code +(bus types, PM domains etc.) to skip the ``->suspend_late`` and +``->suspend_noirq`` callbacks provided by the driver if the device remains in +runtime suspend at the beginning of the ``suspend_late`` phase of system-wide +suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM +has been disabled for it, under the assumption that its state should not change +after that point until the system-wide transition is over. If that happens, the +driver's system-wide resume callbacks, if present, may still be invoked during +the subsequent system-wide resume transition and the device's runtime power +management status may be set to "active" before enabling runtime PM for it, +so the driver must be prepared to cope with the invocation of its system-wide +resume callbacks back-to-back with its ``->runtime_suspend`` one (without the +intervening ``->runtime_resume`` and so on) and the final state of the device +must reflect the "active" status for runtime PM in that case. + During system-wide resume from a sleep state it's easiest to put devices into the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. Refer to that document for more information regarding this particular issue as diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index c0135cd95ada..8d9024017645 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1652,6 +1652,9 @@ static int device_prepare(struct device *dev, pm_message_t state) if (dev->power.syscore) return 0; + WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && + !pm_runtime_enabled(dev)); + /* * If a device's parent goes into runtime suspend at the wrong time, * it won't be possible to resume the device. To prevent this we diff --git a/include/linux/pm.h b/include/linux/pm.h index f10bad831bfa..43b5418e05bb 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -558,6 +558,7 @@ struct pm_subsys_data { * * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device. * SMART_PREPARE: Check the return value of the driver's ->prepare callback. + * SMART_SUSPEND: No need to resume the device from runtime suspend. * * Setting SMART_PREPARE instructs bus types and PM domains which may want * system suspend/resume callbacks to be skipped for the device to return 0 from @@ -565,9 +566,16 @@ struct pm_subsys_data { * other words, the system suspend/resume callbacks can only be skipped for the * device if its driver doesn't object against that). This flag has no effect * if NEVER_SKIP is set. + * + * Setting SMART_SUSPEND instructs bus types and PM domains which may want to + * runtime resume the device upfront during system suspend that doing so is not + * necessary from the driver's perspective. It also may cause them to skip + * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by + * the driver if they decide to leave the device in runtime suspend. */ #define DPM_FLAG_NEVER_SKIP BIT(0) #define DPM_FLAG_SMART_PREPARE BIT(1) +#define DPM_FLAG_SMART_SUSPEND BIT(2) struct dev_pm_info { pm_message_t power_state; -- cgit v1.2.3 From c4b65157aeefad29b2351a00a010e8c40ce7fd0e Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 26 Oct 2017 12:12:22 +0200 Subject: PCI / PM: Take SMART_SUSPEND driver flag into account Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its system-wide PM callbacks and make sure that all code that should not run in parallel with pci_pm_runtime_resume() is executed in the "late" phases of system suspend, freeze and poweroff transitions. [Note that the pm_runtime_suspended() check in pci_dev_keep_suspended() is an optimization, because if is not passed, all of the subsequent checks may be skipped and some of them are much more overhead in general.] Also use the observation that if the device is in runtime suspend at the beginning of the "late" phase of a system-wide suspend-like transition, its state cannot change going forward (runtime PM is disabled for it at that time) until the transition is over and the subsequent system-wide PM callbacks should be skipped for it (as they generally assume the device to not be suspended), so add checks for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq(). Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is called during the subsequent system-wide resume transition and if the device was left in runtime suspend previously, its runtime PM status needs to be changed to "active" as it is going to be put into the full-power state, so add checks for that too to these functions. In turn, if pci_pm_thaw_noirq() runs after the device has been left in runtime suspend, the subsequent "thaw" callbacks need to be skipped for it (as they may not work correctly with a suspended device), so set the power.direct_complete flag for the device then to make the PM core skip those callbacks. In addition to the above add a core helper for checking if DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is "suspended" at the same time, which is done quite often in the new code (and will be done elsewhere going forward too). Signed-off-by: Rafael J. Wysocki Acked-by: Greg Kroah-Hartman Acked-by: Bjorn Helgaas --- Documentation/power/pci.txt | 14 ++++++ drivers/base/power/main.c | 6 +++ drivers/pci/pci-driver.c | 103 ++++++++++++++++++++++++++++++++++++-------- include/linux/pm.h | 2 + 4 files changed, 108 insertions(+), 17 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index ab4e7d0540c1..304162ea377e 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -980,6 +980,20 @@ positive value from pci_pm_prepare() if the ->prepare callback provided by the driver of the device returns a positive value. That allows the driver to opt out from using the direct-complete mechanism dynamically. +The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's +perspective the device can be safely left in runtime suspend during system +suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff() +to skip resuming the device from runtime suspend unless there are PCI-specific +reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(), +pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early +if the device remains in runtime suspend in the beginning of the "late" phase +of the system-wide transition under way. Moreover, if the device is in +runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime +power management status will be changed to "active" (as it is going to be put +into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(), +the function will set the power.direct_complete flag for it (to make the PM core +skip the subsequent "thaw" callbacks for it) and return. + 3.2. Device Runtime Power Management ------------------------------------ In addition to providing device power management callbacks PCI device drivers diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 8d9024017645..6c6f1c74c24c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1861,3 +1861,9 @@ void device_pm_check_callbacks(struct device *dev) !dev->driver->suspend && !dev->driver->resume)); spin_unlock_irq(&dev->power.lock); } + +bool dev_pm_smart_suspend_and_suspended(struct device *dev) +{ + return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && + pm_runtime_status_suspended(dev); +} diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index c1aeeb10539e..d19bd54d337e 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -734,18 +734,25 @@ static int pci_pm_suspend(struct device *dev) if (!pm) { pci_pm_default_suspend(pci_dev); - goto Fixup; + return 0; } /* - * PCI devices suspended at run time need to be resumed at this point, - * because in general it is necessary to reconfigure them for system - * suspend. Namely, if the device is supposed to wake up the system - * from the sleep state, we may need to reconfigure it for this purpose. - * In turn, if the device is not supposed to wake up the system from the - * sleep state, we'll have to prevent it from signaling wake-up. + * PCI devices suspended at run time may need to be resumed at this + * point, because in general it may be necessary to reconfigure them for + * system suspend. Namely, if the device is expected to wake up the + * system from the sleep state, it may have to be reconfigured for this + * purpose, or if the device is not expected to wake up the system from + * the sleep state, it should be prevented from signaling wakeup events + * going forward. + * + * Also if the driver of the device does not indicate that its system + * suspend callbacks can cope with runtime-suspended devices, it is + * better to resume the device from runtime suspend here. */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || + !pci_dev_keep_suspended(pci_dev)) + pm_runtime_resume(dev); pci_dev->state_saved = false; if (pm->suspend) { @@ -765,17 +772,27 @@ static int pci_pm_suspend(struct device *dev) } } - Fixup: - pci_fixup_device(pci_fixup_suspend, pci_dev); - return 0; } +static int pci_pm_suspend_late(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev)); + + return pm_generic_suspend_late(dev); +} + static int pci_pm_suspend_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend_late(dev, PMSG_SUSPEND); @@ -834,6 +851,14 @@ static int pci_pm_resume_noirq(struct device *dev) struct device_driver *drv = dev->driver; int error = 0; + /* + * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend + * during system suspend, so update their runtime PM status to "active" + * as they are going to be put into D0 shortly. + */ + if (dev_pm_smart_suspend_and_suspended(dev)) + pm_runtime_set_active(dev); + pci_pm_default_resume_early(pci_dev); if (pci_has_legacy_pm_support(pci_dev)) @@ -876,6 +901,7 @@ static int pci_pm_resume(struct device *dev) #else /* !CONFIG_SUSPEND */ #define pci_pm_suspend NULL +#define pci_pm_suspend_late NULL #define pci_pm_suspend_noirq NULL #define pci_pm_resume NULL #define pci_pm_resume_noirq NULL @@ -910,7 +936,8 @@ static int pci_pm_freeze(struct device *dev) * devices should not be touched during freeze/thaw transitions, * however. */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) + pm_runtime_resume(dev); pci_dev->state_saved = false; if (pm->freeze) { @@ -925,11 +952,22 @@ static int pci_pm_freeze(struct device *dev) return 0; } +static int pci_pm_freeze_late(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + return pm_generic_freeze_late(dev);; +} + static int pci_pm_freeze_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct device_driver *drv = dev->driver; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend_late(dev, PMSG_FREEZE); @@ -959,6 +997,16 @@ static int pci_pm_thaw_noirq(struct device *dev) struct device_driver *drv = dev->driver; int error = 0; + /* + * If the device is in runtime suspend, the code below may not work + * correctly with it, so skip that code and make the PM core skip all of + * the subsequent "thaw" callbacks for the device. + */ + if (dev_pm_smart_suspend_and_suspended(dev)) { + dev->power.direct_complete = true; + return 0; + } + if (pcibios_pm_ops.thaw_noirq) { error = pcibios_pm_ops.thaw_noirq(dev); if (error) @@ -1008,11 +1056,13 @@ static int pci_pm_poweroff(struct device *dev) if (!pm) { pci_pm_default_suspend(pci_dev); - goto Fixup; + return 0; } /* The reason to do that is the same as in pci_pm_suspend(). */ - pm_runtime_resume(dev); + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || + !pci_dev_keep_suspended(pci_dev)) + pm_runtime_resume(dev); pci_dev->state_saved = false; if (pm->poweroff) { @@ -1024,17 +1074,27 @@ static int pci_pm_poweroff(struct device *dev) return error; } - Fixup: - pci_fixup_device(pci_fixup_suspend, pci_dev); - return 0; } +static int pci_pm_poweroff_late(struct device *dev) +{ + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + + pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev)); + + return pm_generic_poweroff_late(dev); +} + static int pci_pm_poweroff_noirq(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct device_driver *drv = dev->driver; + if (dev_pm_smart_suspend_and_suspended(dev)) + return 0; + if (pci_has_legacy_pm_support(to_pci_dev(dev))) return pci_legacy_suspend_late(dev, PMSG_HIBERNATE); @@ -1076,6 +1136,10 @@ static int pci_pm_restore_noirq(struct device *dev) struct device_driver *drv = dev->driver; int error = 0; + /* This is analogous to the pci_pm_resume_noirq() case. */ + if (dev_pm_smart_suspend_and_suspended(dev)) + pm_runtime_set_active(dev); + if (pcibios_pm_ops.restore_noirq) { error = pcibios_pm_ops.restore_noirq(dev); if (error) @@ -1124,10 +1188,12 @@ static int pci_pm_restore(struct device *dev) #else /* !CONFIG_HIBERNATE_CALLBACKS */ #define pci_pm_freeze NULL +#define pci_pm_freeze_late NULL #define pci_pm_freeze_noirq NULL #define pci_pm_thaw NULL #define pci_pm_thaw_noirq NULL #define pci_pm_poweroff NULL +#define pci_pm_poweroff_late NULL #define pci_pm_poweroff_noirq NULL #define pci_pm_restore NULL #define pci_pm_restore_noirq NULL @@ -1243,10 +1309,13 @@ static const struct dev_pm_ops pci_dev_pm_ops = { .prepare = pci_pm_prepare, .complete = pci_pm_complete, .suspend = pci_pm_suspend, + .suspend_late = pci_pm_suspend_late, .resume = pci_pm_resume, .freeze = pci_pm_freeze, + .freeze_late = pci_pm_freeze_late, .thaw = pci_pm_thaw, .poweroff = pci_pm_poweroff, + .poweroff_late = pci_pm_poweroff_late, .restore = pci_pm_restore, .suspend_noirq = pci_pm_suspend_noirq, .resume_noirq = pci_pm_resume_noirq, diff --git a/include/linux/pm.h b/include/linux/pm.h index 43b5418e05bb..65d39115f06d 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -765,6 +765,8 @@ extern int pm_generic_poweroff_late(struct device *dev); extern int pm_generic_poweroff(struct device *dev); extern void pm_generic_complete(struct device *dev); +extern bool dev_pm_smart_suspend_and_suspended(struct device *dev); + #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) -- cgit v1.2.3