From 5b2ad5acaf5aa8a1ff441665967cf728a46ac967 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Mon, 5 Dec 2022 19:43:43 -0600 Subject: dt-bindings: opp: opp-v2-kryo-cpu: Add missing 'cache-unified' property in example The examples' cache nodes are incomplete as 'cache-unified' is a required cache property for unified caches which an L2 cache certainly is. Signed-off-by: Rob Herring Signed-off-by: Viresh Kumar --- Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml b/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml index 60cf3cbde4c5..b4947b326773 100644 --- a/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml +++ b/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml @@ -106,6 +106,7 @@ examples: L2_0: l2-cache { compatible = "cache"; cache-level = <2>; + cache-unified; }; }; @@ -140,6 +141,7 @@ examples: L2_1: l2-cache { compatible = "cache"; cache-level = <2>; + cache-unified; }; }; -- cgit v1.2.3 From cea7be909414d941a4616e6794f4a5282eb6e652 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 4 Jan 2023 16:38:02 -0800 Subject: drivers/opp: Remove "select SRCU" Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney Cc: Viresh Kumar Cc: Nishanth Menon Cc: Stephen Boyd Cc: Signed-off-by: Viresh Kumar --- drivers/opp/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/opp/Kconfig b/drivers/opp/Kconfig index e8ce47b32735..d7c649a1a981 100644 --- a/drivers/opp/Kconfig +++ b/drivers/opp/Kconfig @@ -1,7 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only config PM_OPP bool - select SRCU help SOCs have a standard set of tuples consisting of frequency and voltage pairs that the device will support per voltage domain. This -- cgit v1.2.3 From 27f850880192c19a26d89dab19e2bddcf276032f Mon Sep 17 00:00:00 2001 From: Kajetan Puchalski Date: Thu, 5 Jan 2023 14:51:58 +0000 Subject: cpuidle: teo: Optionally skip polling states in teo_find_shallower_state() Add a no_poll flag to teo_find_shallower_state() that will let the function optionally not consider polling states. This allows the caller to guard against the function inadvertently resulting in TEO putting the CPU in a polling state when that behaviour is undesirable. Signed-off-by: Kajetan Puchalski Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/teo.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index d9262db79cae..e2864474a98d 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -258,15 +258,17 @@ static s64 teo_middle_of_bin(int idx, struct cpuidle_driver *drv) * @dev: Target CPU. * @state_idx: Index of the capping idle state. * @duration_ns: Idle duration value to match. + * @no_poll: Don't consider polling states. */ static int teo_find_shallower_state(struct cpuidle_driver *drv, struct cpuidle_device *dev, int state_idx, - s64 duration_ns) + s64 duration_ns, bool no_poll) { int i; for (i = state_idx - 1; i >= 0; i--) { - if (dev->states_usage[i].disable) + if (dev->states_usage[i].disable || + (no_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING)) continue; state_idx = i; @@ -469,7 +471,7 @@ end: */ if (idx > idx0 && drv->states[idx].target_residency_ns > delta_tick) - idx = teo_find_shallower_state(drv, dev, idx, delta_tick); + idx = teo_find_shallower_state(drv, dev, idx, delta_tick, false); } return idx; -- cgit v1.2.3 From 9ce0f7c4bc64d820b02a1c53f7e8dba9539f942b Mon Sep 17 00:00:00 2001 From: Kajetan Puchalski Date: Thu, 5 Jan 2023 14:51:59 +0000 Subject: cpuidle: teo: Introduce util-awareness MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Modern interactive systems, such as recent Android phones, tend to have power efficient shallow idle states. Selecting deeper idle states on a device while a latency-sensitive workload is running can adversely impact performance due to increased latency. Additionally, if the CPU wakes up from a deeper sleep before its target residency as is often the case, it results in a waste of energy on top of that. At the moment, none of the available idle governors take any scheduling information into account. They also tend to overestimate the idle duration quite often, which causes them to select excessively deep idle states, thus leading to increased wakeup latency and lower performance with no power saving. For 'menu' while web browsing on Android for instance, those types of wakeups ('too deep') account for over 24% of all wakeups. At the same time, on some platforms idle state 0 can be power efficient enough to warrant wanting to prefer it over idle state 1. This is because the power usage of the two states can be so close that sufficient amounts of too deep state 1 sleeps can completely offset the state 1 power saving to the point where it would've been more power efficient to just use state 0 instead. This is, of course, for systems where state 0 is not a polling state, such as arm-based devices. Sleeps that happened in state 0 while they could have used state 1 ('too shallow') only save less power than they otherwise could have. Too deep sleeps, on the other hand, harm performance and nullify the potential power saving from using state 1 in the first place. While taking this into account, it is clear that on balance it is preferable for an idle governor to have more too shallow sleeps instead of more too deep sleeps on those kinds of platforms. This patch specifically tunes TEO to prefer shallower idle states in order to reduce wakeup latency and achieve better performance. To this end, before selecting the next idle state it uses the avg_util signal of a CPU's runqueue in order to determine to what extent the CPU is being utilized. This util value is then compared to a threshold defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5% in the current implementation). If the util is above the threshold, the index of the idle state selected by TEO metrics will be reduced by 1, thus selecting a shallower state. If the util is below the threshold, the governor defaults to the TEO metrics mechanism to try to select the deepest available idle state based on the closest timer event and its own correctness. The main goal of this is to reduce latency and increase performance for some workloads. Under some workloads it will result in an increase in power usage (Geekbench 5) while for other workloads it will also result in a decrease in power usage compared to TEO (PCMark Web, Jankbench, Speedometer). It can provide drastically decreased latency and performance benefits in certain types of workloads that are sensitive to latency. Example test results: 1. GB5 (better score, latency & more power usage) | metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) | | gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) | | gmean too deep % | 14.99% | 9.65% | 4.02% | | gmean too shallow % | 2.5% | 5.96% | 14.59% | | gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) | 2. Jankbench (better score, latency & less power usage) | metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) | | gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) | | gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) | | gmean too deep % | 26.00% | 11.00% | 2.54% | | gmean too shallow % | 4.74% | 11.89% | 21.93% | | gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) | | gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) | Signed-off-by: Kajetan Puchalski [ rjw: Comment edits and white space adjustments ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/teo.c | 94 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 93 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index e2864474a98d..987fc5f3997d 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -2,8 +2,13 @@ /* * Timer events oriented CPU idle governor * + * TEO governor: * Copyright (C) 2018 - 2021 Intel Corporation * Author: Rafael J. Wysocki + * + * Util-awareness mechanism: + * Copyright (C) 2022 Arm Ltd. + * Author: Kajetan Puchalski */ /** @@ -99,14 +104,55 @@ * select the given idle state instead of the candidate one. * * 3. By default, select the candidate state. + * + * Util-awareness mechanism: + * + * The idea behind the util-awareness extension is that there are two distinct + * scenarios for the CPU which should result in two different approaches to idle + * state selection - utilized and not utilized. + * + * In this case, 'utilized' means that the average runqueue util of the CPU is + * above a certain threshold. + * + * When the CPU is utilized while going into idle, more likely than not it will + * be woken up to do more work soon and so a shallower idle state should be + * selected to minimise latency and maximise performance. When the CPU is not + * being utilized, the usual metrics-based approach to selecting the deepest + * available idle state should be preferred to take advantage of the power + * saving. + * + * In order to achieve this, the governor uses a utilization threshold. + * The threshold is computed per-CPU as a percentage of the CPU's capacity + * by bit shifting the capacity value. Based on testing, the shift of 6 (~1.56%) + * seems to be getting the best results. + * + * Before selecting the next idle state, the governor compares the current CPU + * util to the precomputed util threshold. If it's below, it defaults to the + * TEO metrics mechanism. If it's above, the closest shallower idle state will + * be selected instead, as long as is not a polling state. */ #include #include #include +#include #include +#include #include +/* + * The number of bits to shift the CPU's capacity by in order to determine + * the utilized threshold. + * + * 6 was chosen based on testing as the number that achieved the best balance + * of power and performance on average. + * + * The resulting threshold is high enough to not be triggered by background + * noise and low enough to react quickly when activity starts to ramp up. + */ +#define UTIL_THRESHOLD_SHIFT 6 + + /* * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value * is used for decreasing metrics on a regular basis. @@ -137,9 +183,11 @@ struct teo_bin { * @time_span_ns: Time between idle state selection and post-wakeup update. * @sleep_length_ns: Time till the closest timer event (at the selection time). * @state_bins: Idle state data bins for this CPU. - * @total: Grand total of the "intercepts" and "hits" mertics for all bins. + * @total: Grand total of the "intercepts" and "hits" metrics for all bins. * @next_recent_idx: Index of the next @recent_idx entry to update. * @recent_idx: Indices of bins corresponding to recent "intercepts". + * @util_threshold: Threshold above which the CPU is considered utilized + * @utilized: Whether the last sleep on the CPU happened while utilized */ struct teo_cpu { s64 time_span_ns; @@ -148,10 +196,29 @@ struct teo_cpu { unsigned int total; int next_recent_idx; int recent_idx[NR_RECENT]; + unsigned long util_threshold; + bool utilized; }; static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); +/** + * teo_cpu_is_utilized - Check if the CPU's util is above the threshold + * @cpu: Target CPU + * @cpu_data: Governor CPU data for the target CPU + */ +#ifdef CONFIG_SMP +static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data) +{ + return sched_cpu_util(cpu) > cpu_data->util_threshold; +} +#else +static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data) +{ + return false; +} +#endif + /** * teo_update - Update CPU metrics after wakeup. * @drv: cpuidle driver containing state data. @@ -323,6 +390,22 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, goto end; } + cpu_data->utilized = teo_cpu_is_utilized(dev->cpu, cpu_data); + /* + * If the CPU is being utilized over the threshold and there are only 2 + * states to choose from, the metrics need not be considered, so choose + * the shallowest non-polling state and exit. + */ + if (drv->state_count < 3 && cpu_data->utilized) { + for (i = 0; i < drv->state_count; ++i) { + if (!dev->states_usage[i].disable && + !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) { + idx = i; + goto end; + } + } + } + /* * Find the deepest idle state whose target residency does not exceed * the current sleep length and the deepest idle state not deeper than @@ -454,6 +537,13 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (idx > constraint_idx) idx = constraint_idx; + /* + * If the CPU is being utilized over the threshold, choose a shallower + * non-polling state to improve latency + */ + if (cpu_data->utilized) + idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true); + end: /* * Don't stop the tick if the selected state is a polling one or if the @@ -510,9 +600,11 @@ static int teo_enable_device(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); + unsigned long max_capacity = arch_scale_cpu_capacity(dev->cpu); int i; memset(cpu_data, 0, sizeof(*cpu_data)); + cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT; for (i = 0; i < NR_RECENT; i++) cpu_data->recent_idx[i] = -1; -- cgit v1.2.3 From 4edc13ae891ab368a3026cde1dbf263331bc6e77 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Fri, 9 Dec 2022 17:25:23 +0800 Subject: cpuidle-haltpoll: select haltpoll governor The haltpoll cpuidle driver should select the haltpoll governor, so as to ensure that they work together. Signed-off-by: Li RongQing [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig index ff71dd662880..cac5997dca50 100644 --- a/drivers/cpuidle/Kconfig +++ b/drivers/cpuidle/Kconfig @@ -74,6 +74,7 @@ endmenu config HALTPOLL_CPUIDLE tristate "Halt poll cpuidle driver" depends on X86 && KVM_GUEST + select CPU_IDLE_GOV_HALTPOLL default y help This option enables halt poll cpuidle driver, which allows to poll -- cgit v1.2.3 From 450316dc4f41e857c928dfbcc495c3810d4b1928 Mon Sep 17 00:00:00 2001 From: Richard Fitzgerald Date: Tue, 13 Dec 2022 15:54:48 +0000 Subject: PM: runtime: Document that force_suspend() is incompatible with SMART_SUSPEND pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND, so note this in the kerneldoc. If DPM_FLAG_SMART_SUSPEND is set and the PM core cannot skip system resume it will call pm_runtime_active() on the driver. This can lead to an inconsistent state where: pm_runtime_force_suspend() called ->runtime_suspend but device_resume_noirq() called pm_runtime_set_active() This leaves the driver actually suspended but marked as active. Signed-off-by: Richard Fitzgerald Signed-off-by: Rafael J. Wysocki --- drivers/base/power/runtime.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 50e726b6c2cf..b29be7d4d7d0 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1864,6 +1864,10 @@ static bool pm_runtime_need_not_resume(struct device *dev) * sure the device is put into low power state and it should only be used during * system-wide PM transitions to sleep states. It assumes that the analogous * pm_runtime_force_resume() will be used to resume the device. + * + * Do not use with DPM_FLAG_SMART_SUSPEND as this can lead to an inconsistent + * state where this function has called the ->runtime_suspend callback but the + * PM core marks the driver as runtime active. */ int pm_runtime_force_suspend(struct device *dev) { -- cgit v1.2.3 From 6b37dfcb39f6b7d2e5070181d878a4c373b24bb0 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 2 Jan 2023 19:28:40 -0800 Subject: PM: hibernate: swap: don't use /** for non-kernel-doc comments kernel-doc complains about multiple occurrences of "/**" being used for something that is not a kernel-doc comment, so change all of these to just use "/*" comment style. The warning message for all of these is: FILE:LINE: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst kernel/power/swap.c:585: warning: ... Structure used for CRC32. kernel/power/swap.c:600: warning: ... * CRC32 update function that runs in its own thread. kernel/power/swap.c:627: warning: ... * Structure used for LZO data compression. kernel/power/swap.c:644: warning: ... * Compression function that runs in its own thread. kernel/power/swap.c:952: warning: ... * The following functions allow us to read data using a swap map kernel/power/swap.c:1111: warning: ... * Structure used for LZO data decompression. kernel/power/swap.c:1127: warning: ... * Decompression function that runs in its own thread. Also correct one spello/typo. Signed-off-by: Randy Dunlap Signed-off-by: Rafael J. Wysocki --- kernel/power/swap.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/power/swap.c b/kernel/power/swap.c index 277434b6c0bf..36a1df48280c 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -581,7 +581,7 @@ static int save_image(struct swap_map_handle *handle, return ret; } -/** +/* * Structure used for CRC32. */ struct crc_data { @@ -596,7 +596,7 @@ struct crc_data { unsigned char *unc[LZO_THREADS]; /* uncompressed data */ }; -/** +/* * CRC32 update function that runs in its own thread. */ static int crc32_threadfn(void *data) @@ -623,7 +623,7 @@ static int crc32_threadfn(void *data) } return 0; } -/** +/* * Structure used for LZO data compression. */ struct cmp_data { @@ -640,7 +640,7 @@ struct cmp_data { unsigned char wrk[LZO1X_1_MEM_COMPRESS]; /* compression workspace */ }; -/** +/* * Compression function that runs in its own thread. */ static int lzo_compress_threadfn(void *data) @@ -948,9 +948,9 @@ out_finish: return error; } -/** +/* * The following functions allow us to read data using a swap map - * in a file-alike way + * in a file-like way. */ static void release_swap_reader(struct swap_map_handle *handle) @@ -1107,7 +1107,7 @@ static int load_image(struct swap_map_handle *handle, return ret; } -/** +/* * Structure used for LZO data decompression. */ struct dec_data { @@ -1123,7 +1123,7 @@ struct dec_data { unsigned char cmp[LZO_CMP_SIZE]; /* compressed buffer */ }; -/** +/* * Decompression function that runs in its own thread. */ static int lzo_decompress_threadfn(void *data) -- cgit v1.2.3 From 1b6599f741a4525ca761ecde46e5885ff1e6ba58 Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Tue, 3 Jan 2023 20:57:26 +0800 Subject: powercap: fix possible name leak in powercap_register_zone() In the error path after calling dev_set_name(), the device name is leaked. To fix this, calling dev_set_name() before device_register(), and call put_device() if it returns error. All the resources is released in powercap_release(), so it can return from powercap_register_zone() directly. Fixes: 75d2364ea0ca ("PowerCap: Add class driver") Signed-off-by: Yang Yingliang Signed-off-by: Rafael J. Wysocki --- drivers/powercap/powercap_sys.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c index 1f968353d479..e180dee0f83d 100644 --- a/drivers/powercap/powercap_sys.c +++ b/drivers/powercap/powercap_sys.c @@ -530,9 +530,6 @@ struct powercap_zone *powercap_register_zone( power_zone->name = kstrdup(name, GFP_KERNEL); if (!power_zone->name) goto err_name_alloc; - dev_set_name(&power_zone->dev, "%s:%x", - dev_name(power_zone->dev.parent), - power_zone->id); power_zone->constraints = kcalloc(nr_constraints, sizeof(*power_zone->constraints), GFP_KERNEL); @@ -555,9 +552,16 @@ struct powercap_zone *powercap_register_zone( power_zone->dev_attr_groups[0] = &power_zone->dev_zone_attr_group; power_zone->dev_attr_groups[1] = NULL; power_zone->dev.groups = power_zone->dev_attr_groups; + dev_set_name(&power_zone->dev, "%s:%x", + dev_name(power_zone->dev.parent), + power_zone->id); result = device_register(&power_zone->dev); - if (result) - goto err_dev_ret; + if (result) { + put_device(&power_zone->dev); + mutex_unlock(&control_type->lock); + + return ERR_PTR(result); + } control_type->nr_zones++; mutex_unlock(&control_type->lock); -- cgit v1.2.3 From bdaad038cc3c620a769f2156e7c9aab8605411c2 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Wed, 4 Jan 2023 22:36:01 +0800 Subject: powercap: intel_rapl: add support for Meteor Lake Add Meteor Lake to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: Zhang Rui Signed-off-by: Rafael J. Wysocki --- drivers/powercap/intel_rapl_common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 26d00b1853b4..ca6ff27b4384 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -1113,6 +1113,8 @@ static const struct x86_cpu_id rapl_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, &rapl_defaults_core), + X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE, &rapl_defaults_core), + X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE_L, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &rapl_defaults_spr_server), X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &rapl_defaults_core), -- cgit v1.2.3 From 7adc6885259edd4ef5c9a7a62fd4270cf38fdbfb Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Wed, 4 Jan 2023 22:36:02 +0800 Subject: powercap: intel_rapl: add support for Emerald Rapids Add Emerald Rapids to the list of supported processor models in the Intel RAPL power capping driver. Signed-off-by: Zhang Rui Signed-off-by: Rafael J. Wysocki --- drivers/powercap/intel_rapl_common.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index ca6ff27b4384..9a9192fc8391 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -1116,6 +1116,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE_L, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &rapl_defaults_spr_server), + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, &rapl_defaults_spr_server), X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(ATOM_SILVERMONT, &rapl_defaults_byt), -- cgit v1.2.3 From 716ff71ae234fd3ef1286aac8d8a19a0ed2d509d Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Fri, 6 Jan 2023 12:03:42 +0800 Subject: cpuidle-haltpoll: Replace default_idle() with arch_cpu_idle() When a KVM guest has MWAIT, mwait_idle() is used as the default idle function. However, the cpuidle-haltpoll driver calls default_idle() from default_enter_idle() directly and that one uses HLT instead of MWAIT, which may affect performance adversely, because MWAIT is preferred to HLT as explained by the changelog of commit aebef63cf7ff ("x86: Remove vendor checks from prefer_mwait_c1_over_halt"). Make default_enter_idle() call arch_cpu_idle(), which can use MWAIT, instead of default_idle() to address this issue. Suggested-by: Thomas Gleixner Suggested-by: Rafael J. Wysocki Signed-off-by: Li RongQing [ rjw: Changelog rewrite ] Signed-off-by: Rafael J. Wysocki --- arch/x86/kernel/process.c | 1 + drivers/cpuidle/cpuidle-haltpoll.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 40d156a31676..00a831d56c50 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -721,6 +721,7 @@ void arch_cpu_idle(void) { x86_idle(); } +EXPORT_SYMBOL_GPL(arch_cpu_idle); /* * We use this if we don't have any better idle routine.. diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c index 3a39a7f48b77..e66df22f9695 100644 --- a/drivers/cpuidle/cpuidle-haltpoll.c +++ b/drivers/cpuidle/cpuidle-haltpoll.c @@ -32,7 +32,7 @@ static int default_enter_idle(struct cpuidle_device *dev, local_irq_enable(); return index; } - default_idle(); + arch_cpu_idle(); return index; } -- cgit v1.2.3 From 9a55ab6f02c98bfca1c9c9d73507c1744406d2ba Mon Sep 17 00:00:00 2001 From: Keguang Zhang Date: Thu, 12 Jan 2023 21:53:42 +0800 Subject: cpufreq: loongson1: Delete obsolete driver The generic DT based cpufreq driver works for Loongson-1, so delete the old custom driver. Signed-off-by: Keguang Zhang Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/Kconfig | 9 -- drivers/cpufreq/Makefile | 1 - drivers/cpufreq/loongson1-cpufreq.c | 222 ------------------------------------ 3 files changed, 232 deletions(-) delete mode 100644 drivers/cpufreq/loongson1-cpufreq.c diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index 2a84fc63371e..448b8ffb4ebd 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -270,15 +270,6 @@ config LOONGSON2_CPUFREQ Loongson2F and its successors support this feature. - If in doubt, say N. - -config LOONGSON1_CPUFREQ - tristate "Loongson1 CPUFreq Driver" - depends on LOONGSON1_LS1B - help - This option adds a CPUFreq driver for loongson1 processors which - support software configurable cpu frequency. - If in doubt, say N. endif diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 32a7029e25ed..4a806cc5265b 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -111,7 +111,6 @@ obj-$(CONFIG_POWERNV_CPUFREQ) += powernv-cpufreq.o obj-$(CONFIG_BMIPS_CPUFREQ) += bmips-cpufreq.o obj-$(CONFIG_IA64_ACPI_CPUFREQ) += ia64-acpi-cpufreq.o obj-$(CONFIG_LOONGSON2_CPUFREQ) += loongson2_cpufreq.o -obj-$(CONFIG_LOONGSON1_CPUFREQ) += loongson1-cpufreq.o obj-$(CONFIG_SH_CPU_FREQ) += sh-cpufreq.o obj-$(CONFIG_SPARC_US2E_CPUFREQ) += sparc-us2e-cpufreq.o obj-$(CONFIG_SPARC_US3_CPUFREQ) += sparc-us3-cpufreq.o diff --git a/drivers/cpufreq/loongson1-cpufreq.c b/drivers/cpufreq/loongson1-cpufreq.c deleted file mode 100644 index fb72d709db56..000000000000 --- a/drivers/cpufreq/loongson1-cpufreq.c +++ /dev/null @@ -1,222 +0,0 @@ -/* - * CPU Frequency Scaling for Loongson 1 SoC - * - * Copyright (C) 2014-2016 Zhang, Keguang - * - * This file is licensed under the terms of the GNU General Public - * License version 2. This program is licensed "as is" without any - * warranty of any kind, whether express or implied. - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -struct ls1x_cpufreq { - struct device *dev; - struct clk *clk; /* CPU clk */ - struct clk *mux_clk; /* MUX of CPU clk */ - struct clk *pll_clk; /* PLL clk */ - struct clk *osc_clk; /* OSC clk */ - unsigned int max_freq; - unsigned int min_freq; -}; - -static struct ls1x_cpufreq *cpufreq; - -static int ls1x_cpufreq_notifier(struct notifier_block *nb, - unsigned long val, void *data) -{ - if (val == CPUFREQ_POSTCHANGE) - current_cpu_data.udelay_val = loops_per_jiffy; - - return NOTIFY_OK; -} - -static struct notifier_block ls1x_cpufreq_notifier_block = { - .notifier_call = ls1x_cpufreq_notifier -}; - -static int ls1x_cpufreq_target(struct cpufreq_policy *policy, - unsigned int index) -{ - struct device *cpu_dev = get_cpu_device(policy->cpu); - unsigned int old_freq, new_freq; - - old_freq = policy->cur; - new_freq = policy->freq_table[index].frequency; - - /* - * The procedure of reconfiguring CPU clk is as below. - * - * - Reparent CPU clk to OSC clk - * - Reset CPU clock (very important) - * - Reconfigure CPU DIV - * - Reparent CPU clk back to CPU DIV clk - */ - - clk_set_parent(policy->clk, cpufreq->osc_clk); - __raw_writel(__raw_readl(LS1X_CLK_PLL_DIV) | RST_CPU_EN | RST_CPU, - LS1X_CLK_PLL_DIV); - __raw_writel(__raw_readl(LS1X_CLK_PLL_DIV) & ~(RST_CPU_EN | RST_CPU), - LS1X_CLK_PLL_DIV); - clk_set_rate(cpufreq->mux_clk, new_freq * 1000); - clk_set_parent(policy->clk, cpufreq->mux_clk); - dev_dbg(cpu_dev, "%u KHz --> %u KHz\n", old_freq, new_freq); - - return 0; -} - -static int ls1x_cpufreq_init(struct cpufreq_policy *policy) -{ - struct device *cpu_dev = get_cpu_device(policy->cpu); - struct cpufreq_frequency_table *freq_tbl; - unsigned int pll_freq, freq; - int steps, i; - - pll_freq = clk_get_rate(cpufreq->pll_clk) / 1000; - - steps = 1 << DIV_CPU_WIDTH; - freq_tbl = kcalloc(steps, sizeof(*freq_tbl), GFP_KERNEL); - if (!freq_tbl) - return -ENOMEM; - - for (i = 0; i < (steps - 1); i++) { - freq = pll_freq / (i + 1); - if ((freq < cpufreq->min_freq) || (freq > cpufreq->max_freq)) - freq_tbl[i].frequency = CPUFREQ_ENTRY_INVALID; - else - freq_tbl[i].frequency = freq; - dev_dbg(cpu_dev, - "cpufreq table: index %d: frequency %d\n", i, - freq_tbl[i].frequency); - } - freq_tbl[i].frequency = CPUFREQ_TABLE_END; - - policy->clk = cpufreq->clk; - cpufreq_generic_init(policy, freq_tbl, 0); - - return 0; -} - -static int ls1x_cpufreq_exit(struct cpufreq_policy *policy) -{ - kfree(policy->freq_table); - return 0; -} - -static struct cpufreq_driver ls1x_cpufreq_driver = { - .name = "cpufreq-ls1x", - .flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK, - .verify = cpufreq_generic_frequency_table_verify, - .target_index = ls1x_cpufreq_target, - .get = cpufreq_generic_get, - .init = ls1x_cpufreq_init, - .exit = ls1x_cpufreq_exit, - .attr = cpufreq_generic_attr, -}; - -static int ls1x_cpufreq_remove(struct platform_device *pdev) -{ - cpufreq_unregister_notifier(&ls1x_cpufreq_notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - cpufreq_unregister_driver(&ls1x_cpufreq_driver); - - return 0; -} - -static int ls1x_cpufreq_probe(struct platform_device *pdev) -{ - struct plat_ls1x_cpufreq *pdata = dev_get_platdata(&pdev->dev); - struct clk *clk; - int ret; - - if (!pdata || !pdata->clk_name || !pdata->osc_clk_name) { - dev_err(&pdev->dev, "platform data missing\n"); - return -EINVAL; - } - - cpufreq = - devm_kzalloc(&pdev->dev, sizeof(struct ls1x_cpufreq), GFP_KERNEL); - if (!cpufreq) - return -ENOMEM; - - cpufreq->dev = &pdev->dev; - - clk = devm_clk_get(&pdev->dev, pdata->clk_name); - if (IS_ERR(clk)) { - dev_err(&pdev->dev, "unable to get %s clock\n", - pdata->clk_name); - return PTR_ERR(clk); - } - cpufreq->clk = clk; - - clk = clk_get_parent(clk); - if (IS_ERR(clk)) { - dev_err(&pdev->dev, "unable to get parent of %s clock\n", - __clk_get_name(cpufreq->clk)); - return PTR_ERR(clk); - } - cpufreq->mux_clk = clk; - - clk = clk_get_parent(clk); - if (IS_ERR(clk)) { - dev_err(&pdev->dev, "unable to get parent of %s clock\n", - __clk_get_name(cpufreq->mux_clk)); - return PTR_ERR(clk); - } - cpufreq->pll_clk = clk; - - clk = devm_clk_get(&pdev->dev, pdata->osc_clk_name); - if (IS_ERR(clk)) { - dev_err(&pdev->dev, "unable to get %s clock\n", - pdata->osc_clk_name); - return PTR_ERR(clk); - } - cpufreq->osc_clk = clk; - - cpufreq->max_freq = pdata->max_freq; - cpufreq->min_freq = pdata->min_freq; - - ret = cpufreq_register_driver(&ls1x_cpufreq_driver); - if (ret) { - dev_err(&pdev->dev, - "failed to register CPUFreq driver: %d\n", ret); - return ret; - } - - ret = cpufreq_register_notifier(&ls1x_cpufreq_notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); - - if (ret) { - dev_err(&pdev->dev, - "failed to register CPUFreq notifier: %d\n",ret); - cpufreq_unregister_driver(&ls1x_cpufreq_driver); - } - - return ret; -} - -static struct platform_driver ls1x_cpufreq_platdrv = { - .probe = ls1x_cpufreq_probe, - .remove = ls1x_cpufreq_remove, - .driver = { - .name = "ls1x-cpufreq", - }, -}; - -module_platform_driver(ls1x_cpufreq_platdrv); - -MODULE_ALIAS("platform:ls1x-cpufreq"); -MODULE_AUTHOR("Kelvin Cheung "); -MODULE_DESCRIPTION("Loongson1 CPUFreq driver"); -MODULE_LICENSE("GPL"); -- cgit v1.2.3 From 38a29e5834eba1e71bc4aab82b09ac065af62b80 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 12 Jan 2023 16:11:15 -0800 Subject: drivers/cpufreq: Remove "select SRCU" Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney Acked-by: Viresh Kumar Reviewed-by: John Ogness Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index 448b8ffb4ebd..76aa1336e2be 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -3,7 +3,6 @@ menu "CPU Frequency scaling" config CPU_FREQ bool "CPU Frequency scaling" - select SRCU help CPU Frequency scaling allows you to change the clock speed of CPUs on the fly. This is a nice method to save power, because -- cgit v1.2.3 From 52e0452b413d885d5ab7e3ae85287d67f5a286b2 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 12 Jan 2023 16:11:28 -0800 Subject: PM: sleep: Remove "select SRCU" Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney Reviewed-by: John Ogness Signed-off-by: Rafael J. Wysocki --- kernel/power/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig index 60a1d3051cc7..4b31629c5be4 100644 --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -118,7 +118,6 @@ config PM_SLEEP def_bool y depends on SUSPEND || HIBERNATE_CALLBACKS select PM - select SRCU config PM_SLEEP_SMP def_bool y -- cgit v1.2.3 From c7cd6f04c0dfb6d44337f92b4c32126d20339873 Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Tue, 17 Jan 2023 10:22:40 -0800 Subject: powercap: idle_inject: Support 100% idle injection The users of the idle injection framework allow 100% idle injection. For example: thermal/cpuidle_cooling.c driver. When the ratio is set to 100%, the runtime_duration becomes zero. However, idle_inject_set_duration() in the idle injection framework silently ignores run_duration_us == 0 without any error (it is a void function). The caller will then assume that everything is fine and 100% idle is effective, but in reality the idle duration will not change. There are two options: - The caller may change their max state to 99% instead of 100% and document that 100% is not supported by the idle inject framework. - Add 100% idle support to the idle inject framework. Since there are other protections via RT throttling, this framework can allow 100% idle. The RT throttling will be activated at 95% idle by default. The caller disabling RT throttling and injecting 100% idle, should be aware that CPU can't be used at all. The idle inject timer is started for (run_duration_us + idle_duration_us) duration. Hence replace (run_duration_us && idle_duration_us) with (run_duration_us + idle_duration_us) in the function idle_inject_set_duration(). Also check for !(run_duration_us + idle_duration_us) to return -EINVAL in idle_inject_start(). Signed-off-by: Srinivas Pandruvada Acked-by: Daniel Lezcano [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki --- drivers/powercap/idle_inject.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index fe86a09e3b67..c03b5402c03b 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -155,10 +155,12 @@ void idle_inject_set_duration(struct idle_inject_device *ii_dev, unsigned int run_duration_us, unsigned int idle_duration_us) { - if (run_duration_us && idle_duration_us) { + if (run_duration_us + idle_duration_us) { WRITE_ONCE(ii_dev->run_duration_us, run_duration_us); WRITE_ONCE(ii_dev->idle_duration_us, idle_duration_us); } + if (!run_duration_us) + pr_debug("CPU is forced to 100 percent idle\n"); } /** @@ -201,7 +203,7 @@ int idle_inject_start(struct idle_inject_device *ii_dev) unsigned int idle_duration_us = READ_ONCE(ii_dev->idle_duration_us); unsigned int run_duration_us = READ_ONCE(ii_dev->run_duration_us); - if (!idle_duration_us || !run_duration_us) + if (!(idle_duration_us + run_duration_us)) return -EINVAL; pr_debug("Starting injecting idle cycles on CPUs '%*pbl'\n", -- cgit v1.2.3 From 74528edfbc664f9d2c927c4e5a44f1285598ed0f Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Fri, 20 Jan 2023 11:15:28 +0200 Subject: intel_idle: add Emerald Rapids Xeon support Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire Rapids (SPR). EMR C-states are the same as SPR C-states, and we expect that EMR C-state characteristics (latency and target residency) will be the same as in SPR. Therefore, add EMR support by using SPR C-states table. Signed-off-by: Artem Bityutskiy Signed-off-by: Rafael J. Wysocki --- drivers/idle/intel_idle.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index cfeb24d40d37..bb3d10099ba4 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -1430,6 +1430,7 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &idle_cpu_adl_l), X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_N, &idle_cpu_adl_n), X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr), + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, &idle_cpu_spr), X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl), X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl), X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &idle_cpu_bxt), @@ -1862,6 +1863,7 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv) skx_idle_state_table_update(); break; case INTEL_FAM6_SAPPHIRERAPIDS_X: + case INTEL_FAM6_EMERALDRAPIDS_X: spr_idle_state_table_update(); break; case INTEL_FAM6_ALDERLAKE: -- cgit v1.2.3 From 71bc571c64c13734c69c854e97fb97ac7bf7b54a Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 26 Jan 2023 22:39:51 -0800 Subject: Documentation: power: correct spelling Correct spelling problems for Documentation/power/ as reported by codespell. Signed-off-by: Randy Dunlap Signed-off-by: Rafael J. Wysocki --- Documentation/power/suspend-and-interrupts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/power/suspend-and-interrupts.rst b/Documentation/power/suspend-and-interrupts.rst index 4cda6617709a..dfbace2f4600 100644 --- a/Documentation/power/suspend-and-interrupts.rst +++ b/Documentation/power/suspend-and-interrupts.rst @@ -67,7 +67,7 @@ That may involve turning on a special signal handling logic within the platform during system sleep so as to trigger a system wakeup when needed. For example, the platform may include a dedicated interrupt controller used specifically for handling system wakeup events. Then, if a given interrupt line is supposed to -wake up the system from sleep sates, the corresponding input of that interrupt +wake up the system from sleep states, the corresponding input of that interrupt controller needs to be enabled to receive signals from the line in question. After wakeup, it generally is better to disable that input to prevent the dedicated controller from triggering interrupts unnecessarily. -- cgit v1.2.3 From 68d8ad3bd9c397f2bf009368cb13e48cb91ea018 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Mon, 16 Jan 2023 10:38:42 +0100 Subject: dt-bindings: opp: v2-qcom-level: Let qcom,opp-fuse-level be a 2-long array In some instances (particularly with CPRh) we might want to specifiy more than one qcom,opp-fuse-level, as the same OPP subnodes may be used by different "CPR threads". We need to make sure that n = num_threads entries is legal and so far nobody seems to use more than two, so let's allow that. Acked-by: Rob Herring Signed-off-by: Konrad Dybcio Signed-off-by: Viresh Kumar --- Documentation/devicetree/bindings/opp/opp-v2-qcom-level.yaml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/opp/opp-v2-qcom-level.yaml b/Documentation/devicetree/bindings/opp/opp-v2-qcom-level.yaml index b9ce2e099ce9..a30ef93213c0 100644 --- a/Documentation/devicetree/bindings/opp/opp-v2-qcom-level.yaml +++ b/Documentation/devicetree/bindings/opp/opp-v2-qcom-level.yaml @@ -30,7 +30,9 @@ patternProperties: this OPP node. Sometimes several corners/levels shares a certain fuse corner/level. A fuse corner/level contains e.g. ref uV, min uV, and max uV. - $ref: /schemas/types.yaml#/definitions/uint32 + $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 1 + maxItems: 2 required: - opp-level -- cgit v1.2.3 From cf3e0251868c56b10ccb3e0b3628fcfb0c9c4ec5 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Mon, 30 Jan 2023 11:19:11 -0700 Subject: PM: tools: use canonical ftrace path The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A few scripts in tools/power still refer to this older debugfs path, so let's update them to avoid confusion. Signed-off-by: Ross Zwisler Signed-off-by: Rafael J. Wysocki --- tools/power/pm-graph/sleepgraph.py | 4 ++-- tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py | 4 ++-- tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py | 10 +++++----- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/tools/power/pm-graph/sleepgraph.py b/tools/power/pm-graph/sleepgraph.py index c60c90f35d18..82c09cd25cc2 100755 --- a/tools/power/pm-graph/sleepgraph.py +++ b/tools/power/pm-graph/sleepgraph.py @@ -120,9 +120,9 @@ class SystemValues: cgexp = False testdir = '' outdir = '' - tpath = '/sys/kernel/debug/tracing/' + tpath = '/sys/kernel/tracing/' fpdtpath = '/sys/firmware/acpi/tables/FPDT' - epath = '/sys/kernel/debug/tracing/events/power/' + epath = '/sys/kernel/tracing/events/power/' pmdpath = '/sys/power/pm_debug_messages' s0ixpath = '/sys/module/intel_pmc_core/parameters/warn_on_s0ix_failures' s0ixres = '/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us' diff --git a/tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py b/tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py index 2dea4032ac56..904df0ea0a1e 100755 --- a/tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py +++ b/tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py @@ -248,7 +248,7 @@ def signal_handler(signal, frame): ipt.free_trace_buffer() sys.exit(0) -trace_file = "/sys/kernel/debug/tracing/events/amd_cpu/enable" +trace_file = "/sys/kernel/tracing/events/amd_cpu/enable" signal.signal(signal.SIGINT, signal_handler) interval = "" @@ -319,7 +319,7 @@ print(cur_version) cleanup_data_files() if interval: - file_name = "/sys/kernel/debug/tracing/trace" + file_name = "/sys/kernel/tracing/trace" ipt.clear_trace_file() ipt.set_trace_buffer_size(memory) ipt.enable_trace(trace_file) diff --git a/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py b/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py index b46e9eb8f5aa..ec3323100e1a 100755 --- a/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py +++ b/tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py @@ -373,7 +373,7 @@ def clear_trace_file(): """ Clear trace file """ try: - f_handle = open('/sys/kernel/debug/tracing/trace', 'w') + f_handle = open('/sys/kernel/tracing/trace', 'w') f_handle.close() except: print('IO error clearing trace file ') @@ -401,7 +401,7 @@ def set_trace_buffer_size(memory): """ Set trace buffer size """ try: - with open('/sys/kernel/debug/tracing/buffer_size_kb', 'w') as fp: + with open('/sys/kernel/tracing/buffer_size_kb', 'w') as fp: fp.write(memory) except: print('IO error setting trace buffer size ') @@ -411,7 +411,7 @@ def free_trace_buffer(): """ Free the trace buffer memory """ try: - open('/sys/kernel/debug/tracing/buffer_size_kb' + open('/sys/kernel/tracing/buffer_size_kb' , 'w').write("1") except: print('IO error freeing trace buffer ') @@ -495,7 +495,7 @@ def signal_handler(signal, frame): sys.exit(0) if __name__ == "__main__": - trace_file = "/sys/kernel/debug/tracing/events/power/pstate_sample/enable" + trace_file = "/sys/kernel/tracing/events/power/pstate_sample/enable" signal.signal(signal.SIGINT, signal_handler) interval = "" @@ -569,7 +569,7 @@ if __name__ == "__main__": cleanup_data_files() if interval: - filename = "/sys/kernel/debug/tracing/trace" + filename = "/sys/kernel/tracing/trace" clear_trace_file() set_trace_buffer_size(memory) enable_trace(trace_file) -- cgit v1.2.3 From 7bc1fcd399018245575974508c26e882da0bd915 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:06 +0800 Subject: ACPI: CPPC: Add AMD pstate energy performance preference cppc control Add support for setting and querying EPP preferences to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values. Downstream drivers that want to use the new symbols cppc_get_epp_caps and cppc_set_epp_perf for querying and setting EPP preferences will need to call cppc_set_epp_perf to enable the EPP function firstly. Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- drivers/acpi/cppc_acpi.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++ include/acpi/cppc_acpi.h | 12 +++++++++ 2 files changed, 79 insertions(+) diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 0f17b1c32718..02d83c807271 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1153,6 +1153,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); } +/** + * cppc_get_epp_perf - Get the epp register value. + * @cpunum: CPU from which to get epp preference value. + * @epp_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +int cppc_get_epp_perf(int cpunum, u64 *epp_perf) +{ + return cppc_get_perf(cpunum, ENERGY_PERF, epp_perf); +} +EXPORT_SYMBOL_GPL(cppc_get_epp_perf); + /** * cppc_get_perf_caps - Get a CPU's performance capabilities. * @cpunum: CPU from which to get capabilities info. @@ -1365,6 +1378,60 @@ out_err: } EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs); +/* + * Set Energy Performance Preference Register value through + * Performance Controls Interface + */ +int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable) +{ + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu); + struct cpc_register_resource *epp_set_reg; + struct cpc_register_resource *auto_sel_reg; + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu); + struct cppc_pcc_data *pcc_ss_data = NULL; + int ret; + + if (!cpc_desc) { + pr_debug("No CPC descriptor for CPU:%d\n", cpu); + return -ENODEV; + } + + auto_sel_reg = &cpc_desc->cpc_regs[AUTO_SEL_ENABLE]; + epp_set_reg = &cpc_desc->cpc_regs[ENERGY_PERF]; + + if (CPC_IN_PCC(epp_set_reg) || CPC_IN_PCC(auto_sel_reg)) { + if (pcc_ss_id < 0) { + pr_debug("Invalid pcc_ss_id for CPU:%d\n", cpu); + return -ENODEV; + } + + if (CPC_SUPPORTED(auto_sel_reg)) { + ret = cpc_write(cpu, auto_sel_reg, enable); + if (ret) + return ret; + } + + if (CPC_SUPPORTED(epp_set_reg)) { + ret = cpc_write(cpu, epp_set_reg, perf_ctrls->energy_perf); + if (ret) + return ret; + } + + pcc_ss_data = pcc_data[pcc_ss_id]; + + down_write(&pcc_ss_data->pcc_lock); + /* after writing CPC, transfer the ownership of PCC to platform */ + ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE); + up_write(&pcc_ss_data->pcc_lock); + } else { + ret = -ENOTSUPP; + pr_debug("_CPC in PCC is not supported\n"); + } + + return ret; +} +EXPORT_SYMBOL_GPL(cppc_set_epp_perf); + /** * cppc_set_enable - Set to enable CPPC on the processor by writing the * Continuous Performance Control package EnableRegister field. diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index c5614444031f..6b487a5bd638 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -108,12 +108,14 @@ struct cppc_perf_caps { u32 lowest_nonlinear_perf; u32 lowest_freq; u32 nominal_freq; + u32 energy_perf; }; struct cppc_perf_ctrls { u32 max_perf; u32 min_perf; u32 desired_perf; + u32 energy_perf; }; struct cppc_perf_fb_ctrs { @@ -149,6 +151,8 @@ extern bool cpc_ffh_supported(void); extern bool cpc_supported_by_cpu(void); extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val); extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val); +extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf); +extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable); #else /* !CONFIG_ACPI_CPPC_LIB */ static inline int cppc_get_desired_perf(int cpunum, u64 *desired_perf) { @@ -202,6 +206,14 @@ static inline int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val) { return -ENOTSUPP; } +static inline int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable) +{ + return -ENOTSUPP; +} +static inline int cppc_get_epp_perf(int cpunum, u64 *epp_perf) +{ + return -ENOTSUPP; +} #endif /* !CONFIG_ACPI_CPPC_LIB */ #endif /* _CPPC_ACPI_H*/ -- cgit v1.2.3 From e22abc6bb97cee240200d037a16b73951df16f9a Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:07 +0800 Subject: Documentation: amd-pstate: add EPP profiles introduction The amd-pstate driver supports a feature called energy performance preference (EPP). Add information to the documentation to explain how users can interact with the sysfs files for this feature. 1) See all EPP profiles $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_available_preferences default performance balance_performance balance_power power 2) Check current EPP profile $ sudo cat /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference performance 3) Set new EPP profile $ sudo bash -c "echo power > /sys/devices/system/cpu/cpu0/cpufreq/energy_performance_preference" Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/amd-pstate.rst | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 5376d53faaa8..98a2bb44f80c 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -262,6 +262,25 @@ lowest non-linear performance in `AMD CPPC Performance Capability `_.) This attribute is read-only. +``energy_performance_available_preferences`` + +A list of all the supported EPP preferences that could be used for +``energy_performance_preference`` on this system. +These profiles represent different hints that are provided +to the low-level firmware about the user's desired energy vs efficiency +tradeoff. ``default`` represents the epp value is set by platform +firmware. This attribute is read-only. + +``energy_performance_preference`` + +The current energy performance preference can be read from this attribute. +and user can change current preference according to energy or performance needs +Please get all support profiles list from +``energy_performance_available_preferences`` attribute, all the profiles are +integer values defined between 0 to 255 when EPP feature is enabled by platform +firmware, if EPP feature is disabled, driver will ignore the written value +This attribute is read-write. + Other performance and frequency values can be read back from ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`. -- cgit v1.2.3 From 36c5014e5460963ad7766487c0e22a7ff28681fc Mon Sep 17 00:00:00 2001 From: Wyes Karny Date: Tue, 31 Jan 2023 17:00:08 +0800 Subject: cpufreq: amd-pstate: optimize driver working mode selection in amd_pstate_param() The amd-pstate driver may support multiple working modes. Introduce a variable to keep track of which mode is currently enabled. Here we use cppc_state var to indicate which mode is enabled. This change will help to simplify the the amd_pstate_param() to choose which mode used for the following driver registration. Acked-by: Huang Rui Reviewed-by: Mario Limonciello Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Wyes Karny Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 39 +++++++++++++++++++++++++++++---------- include/linux/amd-pstate.h | 17 +++++++++++++++++ 2 files changed, 46 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index c17bd845f5fc..65c16edbbb20 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -60,7 +60,18 @@ * module parameter to be able to enable it manually for debugging. */ static struct cpufreq_driver amd_pstate_driver; -static int cppc_load __initdata; +static int cppc_state = AMD_PSTATE_DISABLE; + +static inline int get_mode_idx_from_str(const char *str, size_t size) +{ + int i; + + for (i=0; i < AMD_PSTATE_MAX; i++) { + if (!strncmp(str, amd_pstate_mode_string[i], size)) + return i; + } + return -EINVAL; +} static inline int pstate_enable(bool enable) { @@ -626,10 +637,10 @@ static int __init amd_pstate_init(void) /* * by default the pstate driver is disabled to load * enable the amd_pstate passive mode driver explicitly - * with amd_pstate=passive in kernel command line + * with amd_pstate=passive or other modes in kernel command line */ - if (!cppc_load) { - pr_debug("driver load is disabled, boot with amd_pstate=passive to enable this\n"); + if (cppc_state == AMD_PSTATE_DISABLE) { + pr_debug("driver load is disabled, boot with specific mode to enable this\n"); return -ENODEV; } @@ -671,16 +682,24 @@ device_initcall(amd_pstate_init); static int __init amd_pstate_param(char *str) { + size_t size; + int mode_idx; + if (!str) return -EINVAL; - if (!strcmp(str, "disable")) { - cppc_load = 0; - pr_info("driver is explicitly disabled\n"); - } else if (!strcmp(str, "passive")) - cppc_load = 1; + size = strlen(str); + mode_idx = get_mode_idx_from_str(str, size); - return 0; + if (mode_idx >= AMD_PSTATE_DISABLE && mode_idx < AMD_PSTATE_MAX) { + cppc_state = mode_idx; + if (cppc_state == AMD_PSTATE_DISABLE) + pr_info("driver is explicitly disabled\n"); + + return 0; + } + + return -EINVAL; } early_param("amd_pstate", amd_pstate_param); diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 1c4b8659f171..dae2ce0f6735 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -74,4 +74,21 @@ struct amd_cpudata { bool boost_supported; }; +/* + * enum amd_pstate_mode - driver working mode of amd pstate + */ +enum amd_pstate_mode { + AMD_PSTATE_DISABLE = 0, + AMD_PSTATE_PASSIVE, + AMD_PSTATE_ACTIVE, + AMD_PSTATE_MAX, +}; + +static const char * const amd_pstate_mode_string[] = { + [AMD_PSTATE_DISABLE] = "disable", + [AMD_PSTATE_PASSIVE] = "passive", + [AMD_PSTATE_ACTIVE] = "active", + NULL, +}; + #endif /* _LINUX_AMD_PSTATE_H */ -- cgit v1.2.3 From ffa5096a7c338641f70fb06d4778e8cf400181a8 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:09 +0800 Subject: cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors Add EPP driver support for AMD SoCs which support a dedicated MSR for CPPC. EPP is used by the DPM controller to configure the frequency that a core operates at during short periods of activity. The SoC EPP targets are configured on a scale from 0 to 255 where 0 represents maximum performance and 255 represents maximum efficiency. The amd-pstate driver exports profile string names to userspace that are tied to specific EPP values. The balance_performance string (0x80) provides the best balance for efficiency versus power on most systems, but users can choose other strings to meet their needs as well. $ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_available_preferences default performance balance_performance balance_power power $ cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference balance_performance To enable the driver,it needs to add `amd_pstate=active` to kernel command line and kernel will load the active mode epp driver Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 420 ++++++++++++++++++++++++++++++++++++++++++- include/linux/amd-pstate.h | 16 +- 2 files changed, 429 insertions(+), 7 deletions(-) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 65c16edbbb20..bca86b5b8b12 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -59,9 +59,52 @@ * we disable it by default to go acpi-cpufreq on these processors and add a * module parameter to be able to enable it manually for debugging. */ +static struct cpufreq_driver *current_pstate_driver; static struct cpufreq_driver amd_pstate_driver; +static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_DISABLE; +/* + * AMD Energy Preference Performance (EPP) + * The EPP is used in the CCLK DPM controller to drive + * the frequency that a core is going to operate during + * short periods of activity. EPP values will be utilized for + * different OS profiles (balanced, performance, power savings) + * display strings corresponding to EPP index in the + * energy_perf_strings[] + * index String + *------------------------------------- + * 0 default + * 1 performance + * 2 balance_performance + * 3 balance_power + * 4 power + */ +enum energy_perf_value_index { + EPP_INDEX_DEFAULT = 0, + EPP_INDEX_PERFORMANCE, + EPP_INDEX_BALANCE_PERFORMANCE, + EPP_INDEX_BALANCE_POWERSAVE, + EPP_INDEX_POWERSAVE, +}; + +static const char * const energy_perf_strings[] = { + [EPP_INDEX_DEFAULT] = "default", + [EPP_INDEX_PERFORMANCE] = "performance", + [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance", + [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power", + [EPP_INDEX_POWERSAVE] = "power", + NULL +}; + +static unsigned int epp_values[] = { + [EPP_INDEX_DEFAULT] = 0, + [EPP_INDEX_PERFORMANCE] = AMD_CPPC_EPP_PERFORMANCE, + [EPP_INDEX_BALANCE_PERFORMANCE] = AMD_CPPC_EPP_BALANCE_PERFORMANCE, + [EPP_INDEX_BALANCE_POWERSAVE] = AMD_CPPC_EPP_BALANCE_POWERSAVE, + [EPP_INDEX_POWERSAVE] = AMD_CPPC_EPP_POWERSAVE, + }; + static inline int get_mode_idx_from_str(const char *str, size_t size) { int i; @@ -73,6 +116,114 @@ static inline int get_mode_idx_from_str(const char *str, size_t size) return -EINVAL; } +static DEFINE_MUTEX(amd_pstate_limits_lock); +static DEFINE_MUTEX(amd_pstate_driver_lock); + +static s16 amd_pstate_get_epp(struct amd_cpudata *cpudata, u64 cppc_req_cached) +{ + u64 epp; + int ret; + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + if (!cppc_req_cached) { + epp = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, + &cppc_req_cached); + if (epp) + return epp; + } + epp = (cppc_req_cached >> 24) & 0xFF; + } else { + ret = cppc_get_epp_perf(cpudata->cpu, &epp); + if (ret < 0) { + pr_debug("Could not retrieve energy perf value (%d)\n", ret); + return -EIO; + } + } + + return (s16)(epp & 0xff); +} + +static int amd_pstate_get_energy_pref_index(struct amd_cpudata *cpudata) +{ + s16 epp; + int index = -EINVAL; + + epp = amd_pstate_get_epp(cpudata, 0); + if (epp < 0) + return epp; + + switch (epp) { + case AMD_CPPC_EPP_PERFORMANCE: + index = EPP_INDEX_PERFORMANCE; + break; + case AMD_CPPC_EPP_BALANCE_PERFORMANCE: + index = EPP_INDEX_BALANCE_PERFORMANCE; + break; + case AMD_CPPC_EPP_BALANCE_POWERSAVE: + index = EPP_INDEX_BALANCE_POWERSAVE; + break; + case AMD_CPPC_EPP_POWERSAVE: + index = EPP_INDEX_POWERSAVE; + break; + default: + break; + } + + return index; +} + +static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp) +{ + int ret; + struct cppc_perf_ctrls perf_ctrls; + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + u64 value = READ_ONCE(cpudata->cppc_req_cached); + + value &= ~GENMASK_ULL(31, 24); + value |= (u64)epp << 24; + WRITE_ONCE(cpudata->cppc_req_cached, value); + + ret = wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); + if (!ret) + cpudata->epp_cached = epp; + } else { + perf_ctrls.energy_perf = epp; + ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1); + if (ret) { + pr_debug("failed to set energy perf value (%d)\n", ret); + return ret; + } + cpudata->epp_cached = epp; + } + + return ret; +} + +static int amd_pstate_set_energy_pref_index(struct amd_cpudata *cpudata, + int pref_index) +{ + int epp = -EINVAL; + int ret; + + if (!pref_index) { + pr_debug("EPP pref_index is invalid\n"); + return -EINVAL; + } + + if (epp == -EINVAL) + epp = epp_values[pref_index]; + + if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) { + pr_debug("EPP cannot be set under performance policy\n"); + return -EBUSY; + } + + ret = amd_pstate_set_epp(cpudata, epp); + + return ret; +} + static inline int pstate_enable(bool enable) { return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); @@ -81,11 +232,21 @@ static inline int pstate_enable(bool enable) static int cppc_enable(bool enable) { int cpu, ret = 0; + struct cppc_perf_ctrls perf_ctrls; for_each_present_cpu(cpu) { ret = cppc_set_enable(cpu, enable); if (ret) return ret; + + /* Enable autonomous mode for EPP */ + if (cppc_state == AMD_PSTATE_ACTIVE) { + /* Set desired perf as zero to allow EPP firmware control */ + perf_ctrls.desired_perf = 0; + ret = cppc_set_perf(cpu, &perf_ctrls); + if (ret) + return ret; + } } return ret; @@ -429,7 +590,7 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata) return; cpudata->boost_supported = true; - amd_pstate_driver.boost_enabled = true; + current_pstate_driver->boost_enabled = true; } static void amd_perf_ctl_reset(unsigned int cpu) @@ -603,10 +764,61 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, return sprintf(&buf[0], "%u\n", perf); } +static ssize_t show_energy_performance_available_preferences( + struct cpufreq_policy *policy, char *buf) +{ + int i = 0; + int offset = 0; + + while (energy_perf_strings[i] != NULL) + offset += sysfs_emit_at(buf, offset, "%s ", energy_perf_strings[i++]); + + sysfs_emit_at(buf, offset, "\n"); + + return offset; +} + +static ssize_t store_energy_performance_preference( + struct cpufreq_policy *policy, const char *buf, size_t count) +{ + struct amd_cpudata *cpudata = policy->driver_data; + char str_preference[21]; + ssize_t ret; + + ret = sscanf(buf, "%20s", str_preference); + if (ret != 1) + return -EINVAL; + + ret = match_string(energy_perf_strings, -1, str_preference); + if (ret < 0) + return -EINVAL; + + mutex_lock(&amd_pstate_limits_lock); + ret = amd_pstate_set_energy_pref_index(cpudata, ret); + mutex_unlock(&amd_pstate_limits_lock); + + return ret ?: count; +} + +static ssize_t show_energy_performance_preference( + struct cpufreq_policy *policy, char *buf) +{ + struct amd_cpudata *cpudata = policy->driver_data; + int preference; + + preference = amd_pstate_get_energy_pref_index(cpudata); + if (preference < 0) + return preference; + + return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]); +} + cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); cpufreq_freq_attr_ro(amd_pstate_highest_perf); +cpufreq_freq_attr_rw(energy_performance_preference); +cpufreq_freq_attr_ro(energy_performance_available_preferences); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -615,6 +827,186 @@ static struct freq_attr *amd_pstate_attr[] = { NULL, }; +static struct freq_attr *amd_pstate_epp_attr[] = { + &amd_pstate_max_freq, + &amd_pstate_lowest_nonlinear_freq, + &amd_pstate_highest_perf, + &energy_performance_preference, + &energy_performance_available_preferences, + NULL, +}; + +static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) +{ + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; + struct amd_cpudata *cpudata; + struct device *dev; + int rc; + u64 value; + + /* + * Resetting PERF_CTL_MSR will put the CPU in P0 frequency, + * which is ideal for initialization process. + */ + amd_perf_ctl_reset(policy->cpu); + dev = get_cpu_device(policy->cpu); + if (!dev) + goto free_cpudata1; + + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL); + if (!cpudata) + return -ENOMEM; + + cpudata->cpu = policy->cpu; + cpudata->epp_policy = 0; + + rc = amd_pstate_init_perf(cpudata); + if (rc) + goto free_cpudata1; + + min_freq = amd_get_min_freq(cpudata); + max_freq = amd_get_max_freq(cpudata); + nominal_freq = amd_get_nominal_freq(cpudata); + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata); + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) { + dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n", + min_freq, max_freq); + ret = -EINVAL; + goto free_cpudata1; + } + + policy->cpuinfo.min_freq = min_freq; + policy->cpuinfo.max_freq = max_freq; + /* It will be updated by governor */ + policy->cur = policy->cpuinfo.min_freq; + + /* Initial processor data capability frequencies */ + cpudata->max_freq = max_freq; + cpudata->min_freq = min_freq; + cpudata->nominal_freq = nominal_freq; + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq; + + policy->driver_data = cpudata; + + cpudata->epp_cached = amd_pstate_get_epp(cpudata, 0); + + policy->min = policy->cpuinfo.min_freq; + policy->max = policy->cpuinfo.max_freq; + + /* + * Set the policy to powersave to provide a valid fallback value in case + * the default cpufreq governor is neither powersave nor performance. + */ + policy->policy = CPUFREQ_POLICY_POWERSAVE; + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + policy->fast_switch_possible = true; + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, &value); + if (ret) + return ret; + WRITE_ONCE(cpudata->cppc_req_cached, value); + + ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &value); + if (ret) + return ret; + WRITE_ONCE(cpudata->cppc_cap1_cached, value); + } + amd_pstate_boost_init(cpudata); + + return 0; + +free_cpudata1: + kfree(cpudata); + return ret; +} + +static int amd_pstate_epp_cpu_exit(struct cpufreq_policy *policy) +{ + pr_debug("CPU %d exiting\n", policy->cpu); + policy->fast_switch_possible = false; + return 0; +} + +static void amd_pstate_epp_init(unsigned int cpu) +{ + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + struct amd_cpudata *cpudata = policy->driver_data; + u32 max_perf, min_perf; + u64 value; + s16 epp; + + max_perf = READ_ONCE(cpudata->highest_perf); + min_perf = READ_ONCE(cpudata->lowest_perf); + + value = READ_ONCE(cpudata->cppc_req_cached); + + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) + min_perf = max_perf; + + /* Initial min/max values for CPPC Performance Controls Register */ + value &= ~AMD_CPPC_MIN_PERF(~0L); + value |= AMD_CPPC_MIN_PERF(min_perf); + + value &= ~AMD_CPPC_MAX_PERF(~0L); + value |= AMD_CPPC_MAX_PERF(max_perf); + + /* CPPC EPP feature require to set zero to the desire perf bit */ + value &= ~AMD_CPPC_DES_PERF(~0L); + value |= AMD_CPPC_DES_PERF(0); + + if (cpudata->epp_policy == cpudata->policy) + goto skip_epp; + + cpudata->epp_policy = cpudata->policy; + + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) { + epp = amd_pstate_get_epp(cpudata, value); + if (epp < 0) + goto skip_epp; + /* force the epp value to be zero for performance policy */ + epp = 0; + } else { + /* Get BIOS pre-defined epp value */ + epp = amd_pstate_get_epp(cpudata, value); + if (epp) + goto skip_epp; + } + /* Set initial EPP value */ + if (boot_cpu_has(X86_FEATURE_CPPC)) { + value &= ~GENMASK_ULL(31, 24); + value |= (u64)epp << 24; + } + +skip_epp: + WRITE_ONCE(cpudata->cppc_req_cached, value); + amd_pstate_set_epp(cpudata, epp); + cpufreq_cpu_put(policy); +} + +static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) +{ + struct amd_cpudata *cpudata = policy->driver_data; + + if (!policy->cpuinfo.max_freq) + return -ENODEV; + + pr_debug("set_policy: cpuinfo.max %u policy->max %u\n", + policy->cpuinfo.max_freq, policy->max); + + cpudata->policy = policy->policy; + + amd_pstate_epp_init(policy->cpu); + + return 0; +} + +static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data *policy) +{ + cpufreq_verify_within_cpu_limits(policy); + pr_debug("policy_max =%d, policy_min=%d\n", policy->max, policy->min); + return 0; +} + static struct cpufreq_driver amd_pstate_driver = { .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, .verify = amd_pstate_verify, @@ -628,6 +1020,16 @@ static struct cpufreq_driver amd_pstate_driver = { .attr = amd_pstate_attr, }; +static struct cpufreq_driver amd_pstate_epp_driver = { + .flags = CPUFREQ_CONST_LOOPS, + .verify = amd_pstate_epp_verify_policy, + .setpolicy = amd_pstate_epp_set_policy, + .init = amd_pstate_epp_cpu_init, + .exit = amd_pstate_epp_cpu_exit, + .name = "amd_pstate_epp", + .attr = amd_pstate_epp_attr, +}; + static int __init amd_pstate_init(void) { int ret; @@ -656,7 +1058,8 @@ static int __init amd_pstate_init(void) /* capability check */ if (boot_cpu_has(X86_FEATURE_CPPC)) { pr_debug("AMD CPPC MSR based functionality is supported\n"); - amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf; + if (cppc_state == AMD_PSTATE_PASSIVE) + current_pstate_driver->adjust_perf = amd_pstate_adjust_perf; } else { pr_debug("AMD CPPC shared memory based functionality is supported\n"); static_call_update(amd_pstate_enable, cppc_enable); @@ -667,14 +1070,13 @@ static int __init amd_pstate_init(void) /* enable amd pstate feature */ ret = amd_pstate_enable(true); if (ret) { - pr_err("failed to enable amd-pstate with return %d\n", ret); + pr_err("failed to enable with return %d\n", ret); return ret; } - ret = cpufreq_register_driver(&amd_pstate_driver); + ret = cpufreq_register_driver(current_pstate_driver); if (ret) - pr_err("failed to register amd_pstate_driver with return %d\n", - ret); + pr_err("failed to register with return %d\n", ret); return ret; } @@ -696,6 +1098,12 @@ static int __init amd_pstate_param(char *str) if (cppc_state == AMD_PSTATE_DISABLE) pr_info("driver is explicitly disabled\n"); + if (cppc_state == AMD_PSTATE_ACTIVE) + current_pstate_driver = &amd_pstate_epp_driver; + + if (cppc_state == AMD_PSTATE_PASSIVE) + current_pstate_driver = &amd_pstate_driver; + return 0; } diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index dae2ce0f6735..72ea7cf85ca3 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -12,6 +12,11 @@ #include +#define AMD_CPPC_EPP_PERFORMANCE 0x00 +#define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80 +#define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF +#define AMD_CPPC_EPP_POWERSAVE 0xFF + /********************************************************************* * AMD P-state INTERFACE * *********************************************************************/ @@ -47,6 +52,10 @@ struct amd_aperf_mperf { * @prev: Last Aperf/Mperf/tsc count value read from register * @freq: current cpu frequency value * @boost_supported: check whether the Processor or SBIOS supports boost mode + * @epp_policy: Last saved policy used to set energy-performance preference + * @epp_cached: Cached CPPC energy-performance preference value + * @policy: Cpufreq policy value + * @cppc_cap1_cached Cached MSR_AMD_CPPC_CAP1 register value * * The amd_cpudata is key private data for each CPU thread in AMD P-State, and * represents all the attributes and goals that AMD P-State requests at runtime. @@ -72,6 +81,12 @@ struct amd_cpudata { u64 freq; bool boost_supported; + + /* EPP feature related attributes*/ + s16 epp_policy; + s16 epp_cached; + u32 policy; + u64 cppc_cap1_cached; }; /* @@ -90,5 +105,4 @@ static const char * const amd_pstate_mode_string[] = { [AMD_PSTATE_ACTIVE] = "active", NULL, }; - #endif /* _LINUX_AMD_PSTATE_H */ -- cgit v1.2.3 From d4da12f8033a123353eccf993cb95ee5bff21e7c Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:10 +0800 Subject: cpufreq: amd-pstate: implement amd pstate cpu online and offline callback Adds online and offline driver callback support to allow cpu cores go offline and help to restore the previous working states when core goes back online later for EPP driver mode. Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 82 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/amd-pstate.h | 1 + 2 files changed, 83 insertions(+) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index bca86b5b8b12..26f6ac83d87e 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1000,6 +1000,86 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) return 0; } +static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata) +{ + struct cppc_perf_ctrls perf_ctrls; + u64 value, max_perf; + int ret; + + ret = amd_pstate_enable(true); + if (ret) + pr_err("failed to enable amd pstate during resume, return %d\n", ret); + + value = READ_ONCE(cpudata->cppc_req_cached); + max_perf = READ_ONCE(cpudata->highest_perf); + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); + } else { + perf_ctrls.max_perf = max_perf; + perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached); + cppc_set_perf(cpudata->cpu, &perf_ctrls); + } +} + +static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) +{ + struct amd_cpudata *cpudata = policy->driver_data; + + pr_debug("AMD CPU Core %d going online\n", cpudata->cpu); + + if (cppc_state == AMD_PSTATE_ACTIVE) { + amd_pstate_epp_reenable(cpudata); + cpudata->suspended = false; + } + + return 0; +} + +static void amd_pstate_epp_offline(struct cpufreq_policy *policy) +{ + struct amd_cpudata *cpudata = policy->driver_data; + struct cppc_perf_ctrls perf_ctrls; + int min_perf; + u64 value; + + min_perf = READ_ONCE(cpudata->lowest_perf); + value = READ_ONCE(cpudata->cppc_req_cached); + + mutex_lock(&amd_pstate_limits_lock); + if (boot_cpu_has(X86_FEATURE_CPPC)) { + cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN; + + /* Set max perf same as min perf */ + value &= ~AMD_CPPC_MAX_PERF(~0L); + value |= AMD_CPPC_MAX_PERF(min_perf); + value &= ~AMD_CPPC_MIN_PERF(~0L); + value |= AMD_CPPC_MIN_PERF(min_perf); + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); + } else { + perf_ctrls.desired_perf = 0; + perf_ctrls.max_perf = min_perf; + perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE); + cppc_set_perf(cpudata->cpu, &perf_ctrls); + } + mutex_unlock(&amd_pstate_limits_lock); +} + +static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy) +{ + struct amd_cpudata *cpudata = policy->driver_data; + + pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu); + + if (cpudata->suspended) + return 0; + + if (cppc_state == AMD_PSTATE_ACTIVE) + amd_pstate_epp_offline(policy); + + return 0; +} + static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data *policy) { cpufreq_verify_within_cpu_limits(policy); @@ -1026,6 +1106,8 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .setpolicy = amd_pstate_epp_set_policy, .init = amd_pstate_epp_cpu_init, .exit = amd_pstate_epp_cpu_exit, + .offline = amd_pstate_epp_cpu_offline, + .online = amd_pstate_epp_cpu_online, .name = "amd_pstate_epp", .attr = amd_pstate_epp_attr, }; diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 72ea7cf85ca3..f5f22418e64b 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -87,6 +87,7 @@ struct amd_cpudata { s16 epp_cached; u32 policy; u64 cppc_cap1_cached; + bool suspended; }; /* -- cgit v1.2.3 From 50ddd2f7826927e6dc111a43b3a183f53c260fa4 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:11 +0800 Subject: cpufreq: amd-pstate: implement suspend and resume callbacks add suspend and resume support for the AMD processors by amd_pstate_epp driver instance. When the CPPC is suspended, EPP driver will set EPP profile to 'power' profile and set max/min perf to lowest perf value. When resume happens, it will restore the MSR registers with previous cached value. Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 26f6ac83d87e..4e3770e0d4d3 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1087,6 +1087,44 @@ static int amd_pstate_epp_verify_policy(struct cpufreq_policy_data *policy) return 0; } +static int amd_pstate_epp_suspend(struct cpufreq_policy *policy) +{ + struct amd_cpudata *cpudata = policy->driver_data; + int ret; + + /* avoid suspending when EPP is not enabled */ + if (cppc_state != AMD_PSTATE_ACTIVE) + return 0; + + /* set this flag to avoid setting core offline*/ + cpudata->suspended = true; + + /* disable CPPC in lowlevel firmware */ + ret = amd_pstate_enable(false); + if (ret) + pr_err("failed to suspend, return %d\n", ret); + + return 0; +} + +static int amd_pstate_epp_resume(struct cpufreq_policy *policy) +{ + struct amd_cpudata *cpudata = policy->driver_data; + + if (cpudata->suspended) { + mutex_lock(&amd_pstate_limits_lock); + + /* enable amd pstate from suspend state*/ + amd_pstate_epp_reenable(cpudata); + + mutex_unlock(&amd_pstate_limits_lock); + + cpudata->suspended = false; + } + + return 0; +} + static struct cpufreq_driver amd_pstate_driver = { .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, .verify = amd_pstate_verify, @@ -1108,6 +1146,8 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .exit = amd_pstate_epp_cpu_exit, .offline = amd_pstate_epp_cpu_offline, .online = amd_pstate_epp_cpu_online, + .suspend = amd_pstate_epp_suspend, + .resume = amd_pstate_epp_resume, .name = "amd_pstate_epp", .attr = amd_pstate_epp_attr, }; -- cgit v1.2.3 From abd61c08ef349af08df0bf587d33f5bde5996a89 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:12 +0800 Subject: cpufreq: amd-pstate: add driver working mode switch support While amd-pstate driver was loaded with specific driver mode, it will need to check which mode is enabled for the pstate driver,add this sysfs entry to show the current status $ cat /sys/devices/system/cpu/amd-pstate/status active Meanwhile, user can switch the pstate driver mode with writing mode string to sysfs entry as below. Enable passive mode: $ sudo bash -c "echo passive > /sys/devices/system/cpu/amd-pstate/status" Enable active mode (EPP driver mode): $ sudo bash -c "echo active > /sys/devices/system/cpu/amd-pstate/status" Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 118 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 4e3770e0d4d3..1ae2e0d56ed1 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -63,6 +63,7 @@ static struct cpufreq_driver *current_pstate_driver; static struct cpufreq_driver amd_pstate_driver; static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_DISABLE; +struct kobject *amd_pstate_kobj; /* * AMD Energy Preference Performance (EPP) @@ -673,6 +674,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) policy->driver_data = cpudata; amd_pstate_boost_init(cpudata); + if (!current_pstate_driver->adjust_perf) + current_pstate_driver->adjust_perf = amd_pstate_adjust_perf; return 0; @@ -813,12 +816,99 @@ static ssize_t show_energy_performance_preference( return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]); } +static ssize_t amd_pstate_show_status(char *buf) +{ + if (!current_pstate_driver) + return sysfs_emit(buf, "disable\n"); + + return sysfs_emit(buf, "%s\n", amd_pstate_mode_string[cppc_state]); +} + +static void amd_pstate_driver_cleanup(void) +{ + current_pstate_driver = NULL; +} + +static int amd_pstate_update_status(const char *buf, size_t size) +{ + int ret; + int mode_idx; + + if (size > 7 || size < 6) + return -EINVAL; + mode_idx = get_mode_idx_from_str(buf, size); + + switch(mode_idx) { + case AMD_PSTATE_DISABLE: + if (!current_pstate_driver) + return -EINVAL; + if (cppc_state == AMD_PSTATE_ACTIVE) + return -EBUSY; + ret = cpufreq_unregister_driver(current_pstate_driver); + amd_pstate_driver_cleanup(); + break; + case AMD_PSTATE_PASSIVE: + if (current_pstate_driver) { + if (current_pstate_driver == &amd_pstate_driver) + return 0; + cpufreq_unregister_driver(current_pstate_driver); + cppc_state = AMD_PSTATE_PASSIVE; + current_pstate_driver = &amd_pstate_driver; + } + + ret = cpufreq_register_driver(current_pstate_driver); + break; + case AMD_PSTATE_ACTIVE: + if (current_pstate_driver) { + if (current_pstate_driver == &amd_pstate_epp_driver) + return 0; + cpufreq_unregister_driver(current_pstate_driver); + current_pstate_driver = &amd_pstate_epp_driver; + cppc_state = AMD_PSTATE_ACTIVE; + } + + ret = cpufreq_register_driver(current_pstate_driver); + break; + default: + ret = -EINVAL; + break; + } + + return ret; +} + +static ssize_t show_status(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + ssize_t ret; + + mutex_lock(&amd_pstate_driver_lock); + ret = amd_pstate_show_status(buf); + mutex_unlock(&amd_pstate_driver_lock); + + return ret; +} + +static ssize_t store_status(struct kobject *a, struct kobj_attribute *b, + const char *buf, size_t count) +{ + char *p = memchr(buf, '\n', count); + int ret; + + mutex_lock(&amd_pstate_driver_lock); + ret = amd_pstate_update_status(buf, p ? p - buf : count); + mutex_unlock(&amd_pstate_driver_lock); + + return ret < 0 ? ret : count; +} + cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); cpufreq_freq_attr_ro(amd_pstate_highest_perf); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); +define_one_global_rw(status); static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, @@ -836,6 +926,15 @@ static struct freq_attr *amd_pstate_epp_attr[] = { NULL, }; +static struct attribute *pstate_global_attributes[] = { + &status.attr, + NULL +}; + +static const struct attribute_group amd_pstate_global_attr_group = { + .attrs = pstate_global_attributes, +}; + static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -1200,6 +1299,25 @@ static int __init amd_pstate_init(void) if (ret) pr_err("failed to register with return %d\n", ret); + amd_pstate_kobj = kobject_create_and_add("amd_pstate", &cpu_subsys.dev_root->kobj); + if (!amd_pstate_kobj) { + ret = -EINVAL; + pr_err("global sysfs registration failed.\n"); + goto kobject_free; + } + + ret = sysfs_create_group(amd_pstate_kobj, &amd_pstate_global_attr_group); + if (ret) { + pr_err("sysfs attribute export failed with error %d.\n", ret); + goto global_attr_free; + } + + return ret; + +global_attr_free: + kobject_put(amd_pstate_kobj); +kobject_free: + cpufreq_unregister_driver(current_pstate_driver); return ret; } device_initcall(amd_pstate_init); -- cgit v1.2.3 From 92e6088427c5da7ef8dc92d6ab2f0f8f6a01fab7 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:13 +0800 Subject: Documentation: amd-pstate: add amd pstate driver mode introduction The amd-pstate driver has two operation modes supported: * CPPC Autonomous (active) mode * CPPC non-autonomous (passive) mode. active mode and passive mode can be chosen by different kernel parameters. Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/amd-pstate.rst | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 98a2bb44f80c..b6aee69f564f 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -299,8 +299,30 @@ module which supports the new AMD P-States mechanism on most of the future AMD platforms. The AMD P-States mechanism is the more performance and energy efficiency frequency management method on AMD processors. -Kernel Module Options for ``amd-pstate`` -========================================= + +AMD Pstate Driver Operation Modes +================================= + +``amd_pstate`` CPPC has two operation modes: CPPC Autonomous(active) mode and +CPPC non-autonomous(passive) mode. +active mode and passive mode can be chosen by different kernel parameters. +When in Autonomous mode, CPPC ignores requests done in the Desired Performance +Target register and takes into account only the values set to the Minimum requested +performance, Maximum requested performance, and Energy Performance Preference +registers. When Autonomous is disabled, it only considers the Desired Performance Target. + +Active Mode +------------ + +``amd_pstate=active`` + +This is the low-level firmware control mode which is implemented by ``amd_pstate_epp`` +driver with ``amd_pstate=active`` passed to the kernel in the command line. +In this mode, ``amd_pstate_epp`` driver provides a hint to the hardware if software +wants to bias toward performance (0x0) or energy efficiency (0xff) to the CPPC firmware. +then CPPC power algorithm will calculate the runtime workload and adjust the realtime +cores frequency according to the power supply and thermal, core voltage and some other +hardware conditions. Passive Mode ------------ -- cgit v1.2.3 From 5014603e409b01001bfbeae090a16733f61a7640 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:14 +0800 Subject: Documentation: introduce amd pstate active mode kernel command line options AMD Pstate driver support another firmware based autonomous mode with "amd_pstate=active" added to the kernel command line. In autonomous mode SMU firmware decides frequencies at runtime based on workload utilization, usage in other IPs, infrastructure limits such as power, thermals and so on. Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/kernel-parameters.txt | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6cfa6e3996cf..e3618dfdb36a 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -7020,3 +7020,10 @@ management firmware translates the requests into actual hardware states (core frequency, data fabric and memory clocks etc.) + active + Use amd_pstate_epp driver instance as the scaling driver, + driver provides a hint to the hardware if software wants + to bias toward performance (0x0) or energy efficiency (0xff) + to the CPPC firmware. then CPPC power algorithm will + calculate the runtime workload and adjust the realtime cores + frequency. -- cgit v1.2.3 From 3ec32b6d17c5b229c6f5d05849932af1f0c6f523 Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:15 +0800 Subject: cpufreq: amd-pstate: convert sprintf with sysfs_emit() replace the sprintf with a more generic sysfs_emit function No intended potential function impact Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 1ae2e0d56ed1..168a28bed6ee 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -736,7 +736,7 @@ static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy, if (max_freq < 0) return max_freq; - return sprintf(&buf[0], "%u\n", max_freq); + return sysfs_emit(buf, "%u\n", max_freq); } static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy, @@ -749,7 +749,7 @@ static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *poli if (freq < 0) return freq; - return sprintf(&buf[0], "%u\n", freq); + return sysfs_emit(buf, "%u\n", freq); } /* @@ -764,7 +764,7 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, perf = READ_ONCE(cpudata->highest_perf); - return sprintf(&buf[0], "%u\n", perf); + return sysfs_emit(buf, "%u\n", perf); } static ssize_t show_energy_performance_available_preferences( -- cgit v1.2.3 From b9e6a2d47b2565eb450d3ee900fba49cc9b25cbd Mon Sep 17 00:00:00 2001 From: Perry Yuan Date: Tue, 31 Jan 2023 17:00:16 +0800 Subject: Documentation: amd-pstate: introduce new global sysfs attributes The amd-pstate driver supports switching working modes at runtime. Users can view and change modes by interacting with the "status" sysfs attribute. 1) check driver mode: $ cat /sys/devices/system/cpu/amd-pstate/status 2) switch mode: `# echo "passive" | sudo tee /sys/devices/system/cpu/amd-pstate/status` or `# echo "active" | sudo tee /sys/devices/system/cpu/amd-pstate/status` Acked-by: Huang Rui Reviewed-by: Mario Limonciello Reviewed-by: Wyes Karny Tested-by: Wyes Karny Signed-off-by: Perry Yuan Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/amd-pstate.rst | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index b6aee69f564f..5304adf2fc2f 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -339,6 +339,35 @@ processor must provide at least nominal performance requested and go higher if c operating conditions allow. +User Space Interface in ``sysfs`` +================================= + +Global Attributes +----------------- + +``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to +control its functionality at the system level. They are located in the +``/sys/devices/system/cpu/amd-pstate/`` directory and affect all CPUs. + +``status`` + Operation mode of the driver: "active", "passive" or "disable". + + "active" + The driver is functional and in the ``active mode`` + + "passive" + The driver is functional and in the ``passive mode`` + + "disable" + The driver is unregistered and not functional now. + + This attribute can be written to in order to change the driver's + operation mode or to unregister it. The string written to it must be + one of the possible values of it and, if successful, writing one of + these values to the sysfs file will cause the driver to switch over + to the operation mode represented by that string - or to be + unregistered in the "disable" case. + ``cpupower`` tool support for ``amd-pstate`` =============================================== -- cgit v1.2.3 From 7214015c7f975542d227db6eaec2db2ecc88f2c1 Mon Sep 17 00:00:00 2001 From: Yi-Wei Wang Date: Tue, 24 Jan 2023 11:53:23 +0000 Subject: cpufreq: tegra194: Enable CPUFREQ thermal cooling Populate the flag CPUFREQ_IS_COOLING_DEV for the Tegra194 CPUFREQ driver to register it as a cooling device. This enables CPU frequency throttling for CPUs when the passive trip points are crossed. Signed-off-by: Yi-Wei Wang Signed-off-by: Jon Hunter Signed-off-by: Viresh Kumar --- drivers/cpufreq/tegra194-cpufreq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c index 4596c3e323aa..5890e25d7f77 100644 --- a/drivers/cpufreq/tegra194-cpufreq.c +++ b/drivers/cpufreq/tegra194-cpufreq.c @@ -411,7 +411,8 @@ static int tegra194_cpufreq_set_target(struct cpufreq_policy *policy, static struct cpufreq_driver tegra194_cpufreq_driver = { .name = "tegra194", - .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_INITIAL_FREQ_CHECK, + .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_INITIAL_FREQ_CHECK | + CPUFREQ_IS_COOLING_DEV, .verify = cpufreq_generic_frequency_table_verify, .target_index = tegra194_cpufreq_set_target, .get = tegra194_get_speed, -- cgit v1.2.3 From 09608d62ae5cd9559549637cccb4092d8ecad3a8 Mon Sep 17 00:00:00 2001 From: "Nícolas F. R. A. Prado" Date: Thu, 26 Jan 2023 10:48:56 -0500 Subject: cpufreq: mediatek-hw: Register to module device table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Register the compatibles for this module on the module device table so it can be automatically loaded when a matching device is found on the system. Signed-off-by: Nícolas F. R. A. Prado Signed-off-by: Viresh Kumar --- drivers/cpufreq/mediatek-cpufreq-hw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpufreq/mediatek-cpufreq-hw.c b/drivers/cpufreq/mediatek-cpufreq-hw.c index f80339779084..115b0eda38c1 100644 --- a/drivers/cpufreq/mediatek-cpufreq-hw.c +++ b/drivers/cpufreq/mediatek-cpufreq-hw.c @@ -324,6 +324,7 @@ static const struct of_device_id mtk_cpufreq_hw_match[] = { { .compatible = "mediatek,cpufreq-hw", .data = &cpufreq_mtk_offsets }, {} }; +MODULE_DEVICE_TABLE(of, mtk_cpufreq_hw_match); static struct platform_driver mtk_cpufreq_hw_driver = { .probe = mtk_cpufreq_hw_driver_probe, -- cgit v1.2.3 From fa68d9c5ff765318adf8e46ff1d89539a8546a0d Mon Sep 17 00:00:00 2001 From: Luca Weiss Date: Sun, 16 Oct 2022 11:00:30 +0200 Subject: dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing compatibles Document the cpufreq-epss compatibles currently used in the tree, plus the sc7280 which will be added in a separate commit. Signed-off-by: Luca Weiss Signed-off-by: Viresh Kumar --- Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml index 903b31129f01..b69b71d497cc 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml @@ -26,8 +26,12 @@ properties: items: - enum: - qcom,qdu1000-cpufreq-epss + - qcom,sc7280-cpufreq-epss + - qcom,sc8280xp-cpufreq-epss - qcom,sm6375-cpufreq-epss - qcom,sm8250-cpufreq-epss + - qcom,sm8350-cpufreq-epss + - qcom,sm8450-cpufreq-epss - const: qcom,cpufreq-epss reg: -- cgit v1.2.3 From 8e6cb91f946a059c3714c6d7f0e43e313f389183 Mon Sep 17 00:00:00 2001 From: Abel Vesa Date: Mon, 30 Jan 2023 14:30:46 +0200 Subject: dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible Add compatible for EPSS CPUFREQ-HW on SM8550. Also document the interrupts. Signed-off-by: Abel Vesa Acked-by: Rob Herring Signed-off-by: Viresh Kumar --- Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml index b69b71d497cc..4b166fddb958 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml @@ -32,6 +32,7 @@ properties: - qcom,sm8250-cpufreq-epss - qcom,sm8350-cpufreq-epss - qcom,sm8450-cpufreq-epss + - qcom,sm8550-cpufreq-epss - const: qcom,cpufreq-epss reg: -- cgit v1.2.3 From eca4c0eea53432ec4b711b2a8ad282cbad231b4f Mon Sep 17 00:00:00 2001 From: Qi Zheng Date: Wed, 8 Feb 2023 12:00:37 +0800 Subject: OPP: fix error checking in opp_migrate_dentry() Since commit ff9fb72bc077 ("debugfs: return error values, not NULL") changed return value of debugfs_rename() in error cases from %NULL to %ERR_PTR(-ERROR), we should also check error values instead of NULL. Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL") Signed-off-by: Qi Zheng Signed-off-by: Viresh Kumar --- drivers/opp/debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/opp/debugfs.c b/drivers/opp/debugfs.c index 96a30a032c5f..2c7fb683441e 100644 --- a/drivers/opp/debugfs.c +++ b/drivers/opp/debugfs.c @@ -235,7 +235,7 @@ static void opp_migrate_dentry(struct opp_device *opp_dev, dentry = debugfs_rename(rootdir, opp_dev->dentry, rootdir, opp_table->dentry_name); - if (!dentry) { + if (IS_ERR(dentry)) { dev_err(dev, "%s: Failed to rename link from: %s to %s\n", __func__, dev_name(opp_dev->dev), dev_name(dev)); return; -- cgit v1.2.3 From dd329e1e21b54c73f58a440b6164d04d8a7fc542 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Tue, 7 Feb 2023 20:59:09 +0100 Subject: cpufreq: Make cpufreq_unregister_driver() return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All but a few drivers ignore the return value of cpufreq_unregister_driver(). Those few that don't only call it after cpufreq_register_driver() succeeded, in which case the call doesn't fail. Make the function return no value and add a WARN_ON for the case that the function is called in an invalid situation (i.e. without a previous successful call to cpufreq_register_driver()). Signed-off-by: Uwe Kleine-König Acked-by: Florian Fainelli # brcmstb-avs-cpufreq.c Acked-by: Viresh Kumar Reviewed-by: AngeloGioacchino Del Regno Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 4 ++-- drivers/cpufreq/brcmstb-avs-cpufreq.c | 5 +---- drivers/cpufreq/cpufreq.c | 8 +++----- drivers/cpufreq/davinci-cpufreq.c | 4 +++- drivers/cpufreq/mediatek-cpufreq-hw.c | 4 +++- drivers/cpufreq/omap-cpufreq.c | 4 +++- drivers/cpufreq/qcom-cpufreq-hw.c | 4 +++- include/linux/cpufreq.h | 2 +- 8 files changed, 19 insertions(+), 16 deletions(-) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 168a28bed6ee..70debd5a9f40 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -831,7 +831,7 @@ static void amd_pstate_driver_cleanup(void) static int amd_pstate_update_status(const char *buf, size_t size) { - int ret; + int ret = 0; int mode_idx; if (size > 7 || size < 6) @@ -844,7 +844,7 @@ static int amd_pstate_update_status(const char *buf, size_t size) return -EINVAL; if (cppc_state == AMD_PSTATE_ACTIVE) return -EBUSY; - ret = cpufreq_unregister_driver(current_pstate_driver); + cpufreq_unregister_driver(current_pstate_driver); amd_pstate_driver_cleanup(); break; case AMD_PSTATE_PASSIVE: diff --git a/drivers/cpufreq/brcmstb-avs-cpufreq.c b/drivers/cpufreq/brcmstb-avs-cpufreq.c index 4153150e20db..ffea6402189d 100644 --- a/drivers/cpufreq/brcmstb-avs-cpufreq.c +++ b/drivers/cpufreq/brcmstb-avs-cpufreq.c @@ -751,10 +751,7 @@ static int brcm_avs_cpufreq_probe(struct platform_device *pdev) static int brcm_avs_cpufreq_remove(struct platform_device *pdev) { - int ret; - - ret = cpufreq_unregister_driver(&brcm_avs_driver); - WARN_ON(ret); + cpufreq_unregister_driver(&brcm_avs_driver); brcm_avs_prepare_uninit(pdev); diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 7e56a42750ea..85a0bea2dbf1 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2904,12 +2904,12 @@ EXPORT_SYMBOL_GPL(cpufreq_register_driver); * Returns zero if successful, and -EINVAL if the cpufreq_driver is * currently not initialised. */ -int cpufreq_unregister_driver(struct cpufreq_driver *driver) +void cpufreq_unregister_driver(struct cpufreq_driver *driver) { unsigned long flags; - if (!cpufreq_driver || (driver != cpufreq_driver)) - return -EINVAL; + if (WARN_ON(!cpufreq_driver || (driver != cpufreq_driver))) + return; pr_debug("unregistering driver %s\n", driver->name); @@ -2926,8 +2926,6 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver) write_unlock_irqrestore(&cpufreq_driver_lock, flags); cpus_read_unlock(); - - return 0; } EXPORT_SYMBOL_GPL(cpufreq_unregister_driver); diff --git a/drivers/cpufreq/davinci-cpufreq.c b/drivers/cpufreq/davinci-cpufreq.c index 9e97f60f8199..2d23015e2abd 100644 --- a/drivers/cpufreq/davinci-cpufreq.c +++ b/drivers/cpufreq/davinci-cpufreq.c @@ -138,7 +138,9 @@ static int __exit davinci_cpufreq_remove(struct platform_device *pdev) if (cpufreq.asyncclk) clk_put(cpufreq.asyncclk); - return cpufreq_unregister_driver(&davinci_driver); + cpufreq_unregister_driver(&davinci_driver); + + return 0; } static struct platform_driver davinci_cpufreq_driver = { diff --git a/drivers/cpufreq/mediatek-cpufreq-hw.c b/drivers/cpufreq/mediatek-cpufreq-hw.c index f80339779084..f21a9e3df53d 100644 --- a/drivers/cpufreq/mediatek-cpufreq-hw.c +++ b/drivers/cpufreq/mediatek-cpufreq-hw.c @@ -317,7 +317,9 @@ static int mtk_cpufreq_hw_driver_probe(struct platform_device *pdev) static int mtk_cpufreq_hw_driver_remove(struct platform_device *pdev) { - return cpufreq_unregister_driver(&cpufreq_mtk_hw_driver); + cpufreq_unregister_driver(&cpufreq_mtk_hw_driver); + + return 0; } static const struct of_device_id mtk_cpufreq_hw_match[] = { diff --git a/drivers/cpufreq/omap-cpufreq.c b/drivers/cpufreq/omap-cpufreq.c index 1b50df06c6bc..81649a1969b6 100644 --- a/drivers/cpufreq/omap-cpufreq.c +++ b/drivers/cpufreq/omap-cpufreq.c @@ -184,7 +184,9 @@ static int omap_cpufreq_probe(struct platform_device *pdev) static int omap_cpufreq_remove(struct platform_device *pdev) { - return cpufreq_unregister_driver(&omap_driver); + cpufreq_unregister_driver(&omap_driver); + + return 0; } static struct platform_driver omap_cpufreq_platdrv = { diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c index 957cf6bb8c05..b6227fae3285 100644 --- a/drivers/cpufreq/qcom-cpufreq-hw.c +++ b/drivers/cpufreq/qcom-cpufreq-hw.c @@ -768,7 +768,9 @@ of_exit: static int qcom_cpufreq_hw_driver_remove(struct platform_device *pdev) { - return cpufreq_unregister_driver(&cpufreq_qcom_hw_driver); + cpufreq_unregister_driver(&cpufreq_qcom_hw_driver); + + return 0; } static struct platform_driver qcom_cpufreq_hw_driver = { diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 6a94a6eaad27..65623233ab2f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -448,7 +448,7 @@ struct cpufreq_driver { #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING BIT(6) int cpufreq_register_driver(struct cpufreq_driver *driver_data); -int cpufreq_unregister_driver(struct cpufreq_driver *driver_data); +void cpufreq_unregister_driver(struct cpufreq_driver *driver_data); bool cpufreq_driver_test_flags(u16 flags); const char *cpufreq_get_current_driver(void); -- cgit v1.2.3 From 7cca9a9851a5fb44808949539af6c0428e48a267 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 7 Feb 2023 17:12:51 +0100 Subject: cpufreq: amd-pstate: avoid uninitialized variable use The new epp support causes warnings about three separate but related bugs: 1) failing before allocation should just return an error: drivers/cpufreq/amd-pstate.c:951:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (!dev) ^~~~ drivers/cpufreq/amd-pstate.c:1018:9: note: uninitialized use occurs here return ret; ^~~ 2) wrong variable to store return code: drivers/cpufreq/amd-pstate.c:963:6: error: variable 'ret' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (rc) ^~ drivers/cpufreq/amd-pstate.c:1019:9: note: uninitialized use occurs here return ret; ^~~ drivers/cpufreq/amd-pstate.c:963:2: note: remove the 'if' if its condition is always false if (rc) ^~~~~~~ 3) calling amd_pstate_set_epp() in cleanup path after determining that it should not be called: drivers/cpufreq/amd-pstate.c:1055:6: error: variable 'epp' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (cpudata->epp_policy == cpudata->policy) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/cpufreq/amd-pstate.c:1080:30: note: uninitialized use occurs here amd_pstate_set_epp(cpudata, epp); ^~~ All three are trivial to fix, but most likely there are additional bugs in this function when the error handling was not really tested. Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors") Signed-off-by: Arnd Bergmann Tested-by: Wyes Karny Reviewed-by: Yuan Perry Acked-by: Huang Rui Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 70debd5a9f40..b8862afef4e4 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -940,7 +940,6 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; struct amd_cpudata *cpudata; struct device *dev; - int rc; u64 value; /* @@ -950,7 +949,7 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) amd_perf_ctl_reset(policy->cpu); dev = get_cpu_device(policy->cpu); if (!dev) - goto free_cpudata1; + return -ENODEV; cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL); if (!cpudata) @@ -959,8 +958,8 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu; cpudata->epp_policy = 0; - rc = amd_pstate_init_perf(cpudata); - if (rc) + ret = amd_pstate_init_perf(cpudata); + if (ret) goto free_cpudata1; min_freq = amd_get_min_freq(cpudata); @@ -1076,9 +1075,9 @@ static void amd_pstate_epp_init(unsigned int cpu) value |= (u64)epp << 24; } + amd_pstate_set_epp(cpudata, epp); skip_epp: WRITE_ONCE(cpudata->cppc_req_cached, value); - amd_pstate_set_epp(cpudata, epp); cpufreq_cpu_put(policy); } -- cgit v1.2.3 From 5d8f384a9b4fc50f6a18405f1c08e5a87a77b5b3 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Wed, 8 Feb 2023 10:26:54 +0100 Subject: cpufreq: davinci: Fix clk use after free MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The remove function first frees the clks and only then calls cpufreq_unregister_driver(). If one of the cpufreq callbacks is called just before cpufreq_unregister_driver() is run, the freed clks might be used. Fixes: 6601b8030de3 ("davinci: add generic CPUFreq driver for DaVinci") Signed-off-by: Uwe Kleine-König Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/davinci-cpufreq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/davinci-cpufreq.c b/drivers/cpufreq/davinci-cpufreq.c index 2d23015e2abd..ebb3a8102681 100644 --- a/drivers/cpufreq/davinci-cpufreq.c +++ b/drivers/cpufreq/davinci-cpufreq.c @@ -133,13 +133,13 @@ static int __init davinci_cpufreq_probe(struct platform_device *pdev) static int __exit davinci_cpufreq_remove(struct platform_device *pdev) { + cpufreq_unregister_driver(&davinci_driver); + clk_put(cpufreq.armclk); if (cpufreq.asyncclk) clk_put(cpufreq.asyncclk); - cpufreq_unregister_driver(&davinci_driver); - return 0; } -- cgit v1.2.3 From 108fcad91109bd7e9374ae9d509085f5ec55799b Mon Sep 17 00:00:00 2001 From: Thomas Weißschuh Date: Tue, 7 Feb 2023 19:58:18 +0000 Subject: cpufreq: Make kobj_type structure constant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 85a0bea2dbf1..6d8fd3b8dcb5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -993,7 +993,7 @@ static const struct sysfs_ops sysfs_ops = { .store = store, }; -static struct kobj_type ktype_cpufreq = { +static const struct kobj_type ktype_cpufreq = { .sysfs_ops = &sysfs_ops, .default_groups = cpufreq_groups, .release = cpufreq_sysfs_release, -- cgit v1.2.3 From 0b6200e1e9f53dabdc30d0f6c51af9a5f664d32b Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 2 Feb 2023 15:15:45 +0100 Subject: PM: domains: fix memory leak with using debugfs_lookup() When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman Reviewed-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 967bcf9d415e..6097644ebdc5 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -220,13 +220,10 @@ static void genpd_debug_add(struct generic_pm_domain *genpd); static void genpd_debug_remove(struct generic_pm_domain *genpd) { - struct dentry *d; - if (!genpd_debugfs_dir) return; - d = debugfs_lookup(genpd->name, genpd_debugfs_dir); - debugfs_remove(d); + debugfs_lookup_and_remove(genpd->name, genpd_debugfs_dir); } static void genpd_update_accounting(struct generic_pm_domain *genpd) -- cgit v1.2.3 From a0e8c13ccd6a9a636d27353da62c2410c4eca337 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 2 Feb 2023 16:15:15 +0100 Subject: PM: EM: fix memory leak with using debugfs_lookup() When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman Signed-off-by: Rafael J. Wysocki --- kernel/power/energy_model.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index f82111837b8d..7b44f5b89fa1 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -87,10 +87,7 @@ static void em_debug_create_pd(struct device *dev) static void em_debug_remove_pd(struct device *dev) { - struct dentry *debug_dir; - - debug_dir = debugfs_lookup(dev_name(dev), rootdir); - debugfs_remove_recursive(debug_dir); + debugfs_lookup_and_remove(dev_name(dev), rootdir); } static int __init em_debug_init(void) -- cgit v1.2.3 From 7787943a3a8ade6594a68db28c166adbb1d3708c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 6 Feb 2023 20:33:06 +0100 Subject: cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies Some ARMv4 processors don't support suspend, which leads to a build failure with the tegra and qualcomm cpuidle driver: WARNING: unmet direct dependencies detected for ARM_CPU_SUSPEND Depends on [n]: ARCH_SUSPEND_POSSIBLE [=n] Selected by [y]: - ARM_TEGRA_CPUIDLE [=y] && CPU_IDLE [=y] && (ARM [=y] || ARM64) && (ARCH_TEGRA [=n] || COMPILE_TEST [=y]) && !ARM64 && MMU [=y] arch/arm/kernel/sleep.o: in function `__cpu_suspend': (.text+0x68): undefined reference to `cpu_sa110_suspend_size' (.text+0x68): undefined reference to `cpu_fa526_suspend_size' Add an explicit dependency to make randconfig builds avoid this combination. Fixes: faae6c9f2e68 ("cpuidle: tegra: Enable compile testing") Fixes: a871be6b8eee ("cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver") Link: https://lore.kernel.org/all/20211013160125.772873-1-arnd@kernel.org/ Cc: All applicable Reviewed-by: Dmitry Osipenko Signed-off-by: Arnd Bergmann Acked-by: Thierry Reding Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/Kconfig.arm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm index 747aa537389b..f0714a32921e 100644 --- a/drivers/cpuidle/Kconfig.arm +++ b/drivers/cpuidle/Kconfig.arm @@ -102,6 +102,7 @@ config ARM_MVEBU_V7_CPUIDLE config ARM_TEGRA_CPUIDLE bool "CPU Idle Driver for NVIDIA Tegra SoCs" depends on (ARCH_TEGRA || COMPILE_TEST) && !ARM64 && MMU + depends on ARCH_SUSPEND_POSSIBLE select ARCH_NEEDS_CPU_IDLE_COUPLED if SMP select ARM_CPU_SUSPEND help @@ -110,6 +111,7 @@ config ARM_TEGRA_CPUIDLE config ARM_QCOM_SPM_CPUIDLE bool "CPU Idle Driver for Qualcomm Subsystem Power Manager (SPM)" depends on (ARCH_QCOM || COMPILE_TEST) && !ARM64 && MMU + depends on ARCH_SUSPEND_POSSIBLE select ARM_CPU_SUSPEND select CPU_IDLE_MULTIPLE_DRIVERS select DT_IDLE_STATES -- cgit v1.2.3 From e898b07deb3ce801b21113979a05f4b3166c7cb3 Mon Sep 17 00:00:00 2001 From: Thomas Weißschuh Date: Tue, 7 Feb 2023 19:55:19 +0000 Subject: cpuidle: sysfs: make kobj_type structures constant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/sysfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c index 2b496a53cbca..48948b171749 100644 --- a/drivers/cpuidle/sysfs.c +++ b/drivers/cpuidle/sysfs.c @@ -200,7 +200,7 @@ static void cpuidle_sysfs_release(struct kobject *kobj) complete(&kdev->kobj_unregister); } -static struct kobj_type ktype_cpuidle = { +static const struct kobj_type ktype_cpuidle = { .sysfs_ops = &cpuidle_sysfs_ops, .release = cpuidle_sysfs_release, }; @@ -447,7 +447,7 @@ static void cpuidle_state_sysfs_release(struct kobject *kobj) complete(&state_obj->kobj_unregister); } -static struct kobj_type ktype_state_cpuidle = { +static const struct kobj_type ktype_state_cpuidle = { .sysfs_ops = &cpuidle_state_sysfs_ops, .default_groups = cpuidle_state_default_groups, .release = cpuidle_state_sysfs_release, @@ -594,7 +594,7 @@ static struct attribute *cpuidle_driver_default_attrs[] = { }; ATTRIBUTE_GROUPS(cpuidle_driver_default); -static struct kobj_type ktype_driver_cpuidle = { +static const struct kobj_type ktype_driver_cpuidle = { .sysfs_ops = &cpuidle_driver_sysfs_ops, .default_groups = cpuidle_driver_default_groups, .release = cpuidle_driver_sysfs_release, -- cgit v1.2.3 From 41204a607679ccca7eabff9f2871b969d6ef2ce3 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 7 Feb 2023 20:59:59 +0100 Subject: cpuidle: driver: Update microsecond values of state parameters as needed If the cpuidle driver provides the target residency and exit latency in nanoseconds, the corresponding values in microseconds need to be set to reflect the provided numbers in order for the sysfs interface to show them correctly, so make __cpuidle_driver_init() do that. Signed-off-by: Rafael J. Wysocki Tested-by: Artem Bityutskiy --- drivers/cpuidle/driver.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index f70aa17e2a8e..d9cda7f6ccb9 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -183,11 +183,15 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv) s->target_residency_ns = s->target_residency * NSEC_PER_USEC; else if (s->target_residency_ns < 0) s->target_residency_ns = 0; + else + s->target_residency = div_u64(s->target_residency_ns, NSEC_PER_USEC); if (s->exit_latency > 0) s->exit_latency_ns = s->exit_latency * NSEC_PER_USEC; else if (s->exit_latency_ns < 0) s->exit_latency_ns = 0; + else + s->exit_latency = div_u64(s->exit_latency_ns, NSEC_PER_USEC); } } -- cgit v1.2.3 From cf835b005b2857c2fd763a006c1957f332e5254b Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Sat, 11 Feb 2023 11:17:10 +0800 Subject: powercap: intel_rapl: Fix handling for large time window When setting the power limit time window, software updates the 'y' bits and 'f' bits in the power limit register, and the value hardware takes follows the formula below Time window = 2 ^ y * (1 + f / 4) * Time_Unit When handling large time window input from userspace, using left shifting breaks in two cases: 1. when ilog2(value) is bigger than 31, in expression "1 << y", left shifting by more than 31 bits has undefined behavior. This breaks 'y'. For example, on an Alderlake platform, "1 << 32" returns 1. 2. when ilog2(value) equals 31, "1 << 31" returns negative value because '1' is recognized as signed int. And this breaks 'f'. Given that 'y' has 5 bits and hardware can never take a value larger than 31, fix the first problem by clamp the time window to the maximum possible value that the hardware can take. Fix the second problem by using unsigned bit left shift. Note that hardware has its own maximum time window limitation, which may be lower than the time window value retrieved from the power limit register. When this happens, hardware clamps the input to its maximum time window limitation. That is why a software clamp is preferred to handle the problem on hand. Signed-off-by: Zhang Rui [ rjw: Adjusted the comment added by this change ] Signed-off-by: Rafael J. Wysocki --- drivers/powercap/intel_rapl_common.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 9a9192fc8391..8970c7b80884 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -999,7 +999,15 @@ static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value, do_div(value, rp->time_unit); y = ilog2(value); - f = div64_u64(4 * (value - (1 << y)), 1 << y); + + /* + * The target hardware field is 7 bits wide, so return all ones + * if the exponent is too large. + */ + if (y > 0x1f) + return 0x7f; + + f = div64_u64(4 * (value - (1ULL << y)), 1ULL << y); value = (y & 0x1f) | ((f & 0x3) << 5); } return value; -- cgit v1.2.3 From e947925f10c250ba3ad2d905006b2e632510310c Mon Sep 17 00:00:00 2001 From: Keguang Zhang Date: Fri, 10 Feb 2023 19:09:34 +0800 Subject: MIPS: loongson32: Drop obsolete cpufreq platform device The obsolete cpufreq driver was removed, drop the platform device and data accordingly. Link: https://lore.kernel.org/all/20230112135342.3927338-1-keguang.zhang@gmail.com Signed-off-by: Keguang Zhang Signed-off-by: Rafael J. Wysocki --- arch/mips/include/asm/mach-loongson32/cpufreq.h | 18 ------------------ arch/mips/include/asm/mach-loongson32/platform.h | 1 - arch/mips/loongson32/common/platform.c | 16 ---------------- arch/mips/loongson32/ls1b/board.c | 1 - 4 files changed, 36 deletions(-) delete mode 100644 arch/mips/include/asm/mach-loongson32/cpufreq.h diff --git a/arch/mips/include/asm/mach-loongson32/cpufreq.h b/arch/mips/include/asm/mach-loongson32/cpufreq.h deleted file mode 100644 index e422a32883ae..000000000000 --- a/arch/mips/include/asm/mach-loongson32/cpufreq.h +++ /dev/null @@ -1,18 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Copyright (c) 2014 Zhang, Keguang - * - * Loongson 1 CPUFreq platform support. - */ - -#ifndef __ASM_MACH_LOONGSON32_CPUFREQ_H -#define __ASM_MACH_LOONGSON32_CPUFREQ_H - -struct plat_ls1x_cpufreq { - const char *clk_name; /* CPU clk */ - const char *osc_clk_name; /* OSC clk */ - unsigned int max_freq; /* in kHz */ - unsigned int min_freq; /* in kHz */ -}; - -#endif /* __ASM_MACH_LOONGSON32_CPUFREQ_H */ diff --git a/arch/mips/include/asm/mach-loongson32/platform.h b/arch/mips/include/asm/mach-loongson32/platform.h index eb83e2741887..86e1a6aab4e5 100644 --- a/arch/mips/include/asm/mach-loongson32/platform.h +++ b/arch/mips/include/asm/mach-loongson32/platform.h @@ -12,7 +12,6 @@ #include extern struct platform_device ls1x_uart_pdev; -extern struct platform_device ls1x_cpufreq_pdev; extern struct platform_device ls1x_eth0_pdev; extern struct platform_device ls1x_eth1_pdev; extern struct platform_device ls1x_ehci_pdev; diff --git a/arch/mips/loongson32/common/platform.c b/arch/mips/loongson32/common/platform.c index 311dc1580bbd..64d7979394e6 100644 --- a/arch/mips/loongson32/common/platform.c +++ b/arch/mips/loongson32/common/platform.c @@ -15,7 +15,6 @@ #include #include -#include #include #include @@ -62,21 +61,6 @@ void __init ls1x_serial_set_uartclk(struct platform_device *pdev) p->uartclk = clk_get_rate(clk); } -/* CPUFreq */ -static struct plat_ls1x_cpufreq ls1x_cpufreq_pdata = { - .clk_name = "cpu_clk", - .osc_clk_name = "osc_clk", - .max_freq = 266 * 1000, - .min_freq = 33 * 1000, -}; - -struct platform_device ls1x_cpufreq_pdev = { - .name = "ls1x-cpufreq", - .dev = { - .platform_data = &ls1x_cpufreq_pdata, - }, -}; - /* Synopsys Ethernet GMAC */ static struct stmmac_mdio_bus_data ls1x_mdio_bus_data = { .phy_mask = 0, diff --git a/arch/mips/loongson32/ls1b/board.c b/arch/mips/loongson32/ls1b/board.c index 727e06718dab..fed8d432ef20 100644 --- a/arch/mips/loongson32/ls1b/board.c +++ b/arch/mips/loongson32/ls1b/board.c @@ -35,7 +35,6 @@ static const struct gpio_led_platform_data ls1x_led_pdata __initconst = { static struct platform_device *ls1b_platform_devices[] __initdata = { &ls1x_uart_pdev, - &ls1x_cpufreq_pdev, &ls1x_eth0_pdev, &ls1x_eth1_pdev, &ls1x_ehci_pdev, -- cgit v1.2.3 From f9901f64536c39f699e6ed228c5e64e7e7ce8708 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 25 Jan 2023 12:34:18 +0100 Subject: cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT The runtime Power Management of CPU topology is not compatible with PREEMPT_RT: 1. Core cpuidle path disables IRQs. 2. Core cpuidle calls cpuidle-psci. 3. cpuidle-psci in __psci_enter_domain_idle_state() calls pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use spinlocks (which are sleeping on PREEMPT_RT). Deep sleep modes are not a priority of Realtime kernels because the latencies might become unpredictable. On the other hand the PSCI CPU idle power domain is a parent of other devices and power domain controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250). Disable the idle callbacks in cpuidle-psci and mark the domain as always on. This is a trade-off between making PREEMPT_RT working and still having a proper power domain hierarchy in the system. Signed-off-by: Krzysztof Kozlowski Reviewed-by: Ulf Hansson Tested-by: Adrien Thierry Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/Kconfig.arm | 8 ++++++++ drivers/cpuidle/cpuidle-psci-domain.c | 7 +++++-- drivers/cpuidle/cpuidle-psci.c | 3 +++ 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm index f0714a32921e..a1ee475d180d 100644 --- a/drivers/cpuidle/Kconfig.arm +++ b/drivers/cpuidle/Kconfig.arm @@ -24,6 +24,14 @@ config ARM_PSCI_CPUIDLE It provides an idle driver that is capable of detecting and managing idle states through the PSCI firmware interface. + The driver has limitations when used with PREEMPT_RT: + - If the idle states are described with the non-hierarchical layout, + all idle states are still available. + + - If the idle states are described with the hierarchical layout, + only the idle states defined per CPU are available, but not the ones + being shared among a group of CPUs (aka cluster idle states). + config ARM_PSCI_CPUIDLE_DOMAIN bool "PSCI CPU idle Domain" depends on ARM_PSCI_CPUIDLE diff --git a/drivers/cpuidle/cpuidle-psci-domain.c b/drivers/cpuidle/cpuidle-psci-domain.c index c80cf9ddabd8..6ad2954948a5 100644 --- a/drivers/cpuidle/cpuidle-psci-domain.c +++ b/drivers/cpuidle/cpuidle-psci-domain.c @@ -64,8 +64,11 @@ static int psci_pd_init(struct device_node *np, bool use_osi) pd->flags |= GENPD_FLAG_IRQ_SAFE | GENPD_FLAG_CPU_DOMAIN; - /* Allow power off when OSI has been successfully enabled. */ - if (use_osi) + /* + * Allow power off when OSI has been successfully enabled. + * PREEMPT_RT is not yet ready to enter domain idle states. + */ + if (use_osi && !IS_ENABLED(CONFIG_PREEMPT_RT)) pd->power_off = psci_pd_power_off; else pd->flags |= GENPD_FLAG_ALWAYS_ON; diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c index 57bc3e3ae391..68467a325dcb 100644 --- a/drivers/cpuidle/cpuidle-psci.c +++ b/drivers/cpuidle/cpuidle-psci.c @@ -231,6 +231,9 @@ static int psci_dt_cpu_init_topology(struct cpuidle_driver *drv, if (!psci_has_osi_support()) return 0; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + return 0; + data->dev = psci_dt_attach_cpu(cpu); if (IS_ERR_OR_NULL(data->dev)) return PTR_ERR_OR_ZERO(data->dev); -- cgit v1.2.3 From 41a337b40e983db4f0e1602308109f2b93687a06 Mon Sep 17 00:00:00 2001 From: Richard Fitzgerald Date: Mon, 13 Feb 2023 16:50:05 +0000 Subject: PM: Add EXPORT macros for exporting PM functions Add a pair of macros for exporting functions only if CONFIG_PM is enabled. The naming follows the style of the standard EXPORT_SYMBOL_*() macros that they replace. Sometimes a module wants to export PM functions directly to other drivers, not a complete struct dev_pm_ops. A typical example is where a core library exports the generic (shared) implementation and calling code wraps one or more of these in custom code. Signed-off-by: Richard Fitzgerald Signed-off-by: Rafael J. Wysocki --- include/linux/pm.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/linux/pm.h b/include/linux/pm.h index 93cd34f00822..035d9649eba4 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -379,9 +379,13 @@ const struct dev_pm_ops name = { \ const struct dev_pm_ops name; \ __EXPORT_SYMBOL(name, sec, ns); \ const struct dev_pm_ops name +#define EXPORT_PM_FN_GPL(name) EXPORT_SYMBOL_GPL(name) +#define EXPORT_PM_FN_NS_GPL(name, ns) EXPORT_SYMBOL_NS_GPL(name, ns) #else #define _EXPORT_DEV_PM_OPS(name, sec, ns) \ static __maybe_unused const struct dev_pm_ops __static_##name +#define EXPORT_PM_FN_GPL(name) +#define EXPORT_PM_FN_NS_GPL(name, ns) #endif #define EXPORT_DEV_PM_OPS(name) _EXPORT_DEV_PM_OPS(name, "", "") -- cgit v1.2.3 From 26e27f4e382f126d80e230df2a76d5f1c5a1ac42 Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Wed, 8 Feb 2023 16:39:11 +0100 Subject: dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables Add additional info on what opp tables the defined devices in this schema supports (operating-points-v2-kryo-cpu and operating-points-v2-qcom-level) and reference them. Signed-off-by: Christian Marangi Reviewed-by: Krzysztof Kozlowski Signed-off-by: Viresh Kumar --- .../bindings/cpufreq/qcom-cpufreq-nvmem.yaml | 35 ++++++++++++++++------ 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml b/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml index 9c086eac6ca7..7c42d9439abd 100644 --- a/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml +++ b/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml @@ -55,15 +55,32 @@ properties: patternProperties: '^opp-table(-[a-z0-9]+)?$': - if: - properties: - compatible: - const: operating-points-v2-kryo-cpu - then: - patternProperties: - '^opp-?[0-9]+$': - required: - - required-opps + allOf: + - if: + properties: + compatible: + const: operating-points-v2-kryo-cpu + then: + $ref: /schemas/opp/opp-v2-kryo-cpu.yaml# + + - if: + properties: + compatible: + const: operating-points-v2-kryo-cpu + then: + patternProperties: + '^opp-?[0-9]+$': + required: + - required-opps + + - if: + properties: + compatible: + const: operating-points-v2-qcom-level + then: + $ref: /schemas/opp/opp-v2-qcom-level.yaml# + + unevaluatedProperties: false additionalProperties: true -- cgit v1.2.3 From 389de9c5a677c0d950528ee7d1871366ad2a6fd8 Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Wed, 8 Feb 2023 16:39:12 +0100 Subject: dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional The qcom-cpufreq-nvmem driver supports 2 kind of devices: - pre-cpr that doesn't have power-domains and base everything on nvmem cells and multiple named microvolt bindings. Doesn't need required-opp binding in the opp nodes as they are only used for genpd based devices. - cpr-based that require power-domain in the cpu nodes and use various source to decide the correct voltage and freq Require required-opp binding since they need to be linked to the related opp-level. When the schema was introduced, it was wrongly set to always require these binding but this is not the case for pre-cpr devices. Make the power-domain and the required-opp optional and set them required only for qcs404 based devices. Signed-off-by: Christian Marangi Reviewed-by: Krzysztof Kozlowski Signed-off-by: Viresh Kumar --- .../bindings/cpufreq/qcom-cpufreq-nvmem.yaml | 74 +++++++++++++--------- 1 file changed, 44 insertions(+), 30 deletions(-) diff --git a/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml b/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml index 7c42d9439abd..6f5e7904181f 100644 --- a/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml +++ b/Documentation/devicetree/bindings/cpufreq/qcom-cpufreq-nvmem.yaml @@ -17,6 +17,9 @@ description: | on the CPU OPP in use. The CPUFreq driver sets the CPR power domain level according to the required OPPs defined in the CPU OPP tables. + For old implementation efuses are parsed to select the correct opp table and + voltage and CPR is not supported/used. + select: properties: compatible: @@ -33,26 +36,6 @@ select: required: - compatible -properties: - cpus: - type: object - - patternProperties: - '^cpu@[0-9a-f]+$': - type: object - - properties: - power-domains: - maxItems: 1 - - power-domain-names: - items: - - const: cpr - - required: - - power-domains - - power-domain-names - patternProperties: '^opp-table(-[a-z0-9]+)?$': allOf: @@ -63,16 +46,6 @@ patternProperties: then: $ref: /schemas/opp/opp-v2-kryo-cpu.yaml# - - if: - properties: - compatible: - const: operating-points-v2-kryo-cpu - then: - patternProperties: - '^opp-?[0-9]+$': - required: - - required-opps - - if: properties: compatible: @@ -82,6 +55,47 @@ patternProperties: unevaluatedProperties: false +allOf: + - if: + properties: + compatible: + contains: + enum: + - qcom,qcs404 + + then: + properties: + cpus: + type: object + + patternProperties: + '^cpu@[0-9a-f]+$': + type: object + + properties: + power-domains: + maxItems: 1 + + power-domain-names: + items: + - const: cpr + + required: + - power-domains + - power-domain-names + + patternProperties: + '^opp-table(-[a-z0-9]+)?$': + if: + properties: + compatible: + const: operating-points-v2-kryo-cpu + then: + patternProperties: + '^opp-?[0-9]+$': + required: + - required-opps + additionalProperties: true examples: -- cgit v1.2.3 From ba38f3cbe7db2cec802c6d60e2bef57a3ff095a4 Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Wed, 8 Feb 2023 16:39:13 +0100 Subject: dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum Enlarge opp-supported-hw maximum value. In recent SoC we started matching more bit and we currently match mask of 112. The old maximum of 7 was good for old SoC that didn't had complex id, but now this is limiting and we need to enlarge it to support more variants. Document all the various mask that can be used and limit them to only reasonable values instead of using a generic maximum limit. Signed-off-by: Christian Marangi Reviewed-by: Krzysztof Kozlowski Signed-off-by: Viresh Kumar --- .../devicetree/bindings/opp/opp-v2-kryo-cpu.yaml | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml b/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml index 60cf3cbde4c5..c1bd185d8a0b 100644 --- a/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml +++ b/Documentation/devicetree/bindings/opp/opp-v2-kryo-cpu.yaml @@ -50,12 +50,22 @@ patternProperties: opp-supported-hw: description: | A single 32 bit bitmap value, representing compatible HW. - Bitmap: + Bitmap for MSM8996 format: 0: MSM8996, speedbin 0 1: MSM8996, speedbin 1 2: MSM8996, speedbin 2 - 3-31: unused - maximum: 0x7 + 3: MSM8996, speedbin 3 + 4-31: unused + + Bitmap for MSM8996SG format (speedbin shifted of 4 left): + 0-3: unused + 4: MSM8996SG, speedbin 0 + 5: MSM8996SG, speedbin 1 + 6: MSM8996SG, speedbin 2 + 7-31: unused + enum: [0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, + 0x9, 0xd, 0xe, 0xf, + 0x10, 0x20, 0x30, 0x70] clock-latency-ns: true -- cgit v1.2.3 From 6e9d12125fcac048c78f497044370c1da85b552a Mon Sep 17 00:00:00 2001 From: Wyes Karny Date: Tue, 14 Feb 2023 07:58:11 +0000 Subject: cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ `amd_pstate_set_epp` function uses `cppc_req_cached` and `epp` variable to update the MSR_AMD_CPPC_REQ register for AMD MSR systems. The recent commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use") changed the sequence of updating cppc_req_cached and writing the MSR_AMD_CPPC_REQ. Therefore while switching from powersave to performance governor and vice-versa in active mode MSR_AMD_CPPC_REQ is set with the previous cached value. To fix this: first update the `cppc_req_cached` variable and then call `amd_pstate_set_epp` function. - Before commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use"): With powersave governor: [ 1.652743] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff [ 1.652744] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 1.652746] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 Changing to performance governor: [ 300.493842] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff [ 300.493846] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 300.493847] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 - After commit 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use"): With powersave governor: [ 1.646037] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 1.646038] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 [ 1.646042] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff Changing to performance governor: [ 687.117401] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 687.117405] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 [ 687.117419] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff - After this fix: With powersave governor: [ 2.525717] amd_pstate_epp_init: writing to cppc_req_cached = 0x1eff [ 2.525720] amd_pstate_set_epp: writing cppc_req_cached = 0x1eff [ 2.525722] amd_pstate_set_epp: writing min_perf = 30, des_perf = 0, max_perf = 255, epp = 0 Changing to performance governor: [ 3440.152468] amd_pstate_epp_init: writing to cppc_req_cached = 0xffff [ 3440.152473] amd_pstate_set_epp: writing cppc_req_cached = 0xffff [ 3440.152474] amd_pstate_set_epp: writing min_perf = 255, des_perf = 0, max_perf = 255, epp = 0 Fixes: 7cca9a9851a5 ("cpufreq: amd-pstate: avoid uninitialized variable use") Signed-off-by: Wyes Karny Acked-by: Huang Rui Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/amd-pstate.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index b8862afef4e4..45c88894fd8e 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1057,27 +1057,28 @@ static void amd_pstate_epp_init(unsigned int cpu) cpudata->epp_policy = cpudata->policy; - if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) { - epp = amd_pstate_get_epp(cpudata, value); - if (epp < 0) - goto skip_epp; - /* force the epp value to be zero for performance policy */ - epp = 0; - } else { - /* Get BIOS pre-defined epp value */ - epp = amd_pstate_get_epp(cpudata, value); - if (epp) - goto skip_epp; + /* Get BIOS pre-defined epp value */ + epp = amd_pstate_get_epp(cpudata, value); + if (epp < 0) { + /** + * This return value can only be negative for shared_memory + * systems where EPP register read/write not supported. + */ + goto skip_epp; } + + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) + epp = 0; + /* Set initial EPP value */ if (boot_cpu_has(X86_FEATURE_CPPC)) { value &= ~GENMASK_ULL(31, 24); value |= (u64)epp << 24; } + WRITE_ONCE(cpudata->cppc_req_cached, value); amd_pstate_set_epp(cpudata, epp); skip_epp: - WRITE_ONCE(cpudata->cppc_req_cached, value); cpufreq_cpu_put(policy); } -- cgit v1.2.3 From 73dd3206811af32695e46ca71e2b0da5dac81ad9 Mon Sep 17 00:00:00 2001 From: Bagas Sanjaya Date: Wed, 15 Feb 2023 19:32:53 +0700 Subject: Documentation: amd-pstate: disambiguate user space sections kernel test robot reported htmldocs warning: Documentation/admin-guide/pm/amd-pstate.rst:343: WARNING: duplicate label admin-guide/pm/amd-pstate:user space interface in ``sysfs``, other instance in Documentation/admin-guide/pm/amd-pstate.rst The documentation contains two sections with the same "User Space Interface in ``sysfs``" title. The first one deals with per-policy sysfs and the second one is about general attributes (currently only global attributes are documented). Disambiguate title text of both sections to fix the warning. Link: https://lore.kernel.org/linux-doc/202302151041.0SWs1RHK-lkp@intel.com/ Fixes: b9e6a2d47b2565 ("Documentation: amd-pstate: introduce new global sysfs attributes") Reported-by: kernel test robot Signed-off-by: Bagas Sanjaya Reviewed-by: Wyes Karny Acked-by: Huang Rui Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/amd-pstate.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 5304adf2fc2f..d143e72cf93e 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -230,8 +230,8 @@ with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond to the request from AMD P-States. -User Space Interface in ``sysfs`` -================================== +User Space Interface in ``sysfs`` - Per-policy control +====================================================== ``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to control its functionality at the system level. They are located in the @@ -339,8 +339,8 @@ processor must provide at least nominal performance requested and go higher if c operating conditions allow. -User Space Interface in ``sysfs`` -================================= +User Space Interface in ``sysfs`` - General +=========================================== Global Attributes ----------------- -- cgit v1.2.3