summaryrefslogtreecommitdiff
path: root/include/linux/energy_model.h
AgeCommit message (Collapse)AuthorFilesLines
2023-12-23energy_model: Use a fixed reference frequencyVincent Guittot1-3/+3
The last item of a performance domain is not always the performance point that has been used to compute CPU's capacity. This can lead to different target frequency compared with other part of the system like schedutil and would result in wrong energy estimation. A new arch_scale_freq_ref() is available to return a fixed and coherent frequency reference that can be used when computing the CPU's frequency for an level of utilization. Use this function to get this reference frequency. Energy model is never used without defining arch_scale_freq_ref() but can be compiled. Define a default arch_scale_freq_ref() returning 0 in such case. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20231211104855.558096-5-vincent.guittot@linaro.org
2023-11-23sched/cpufreq: Rework schedutil governor performance estimationVincent Guittot1-1/+0
The current method to take into account uclamp hints when estimating the target frequency can end in a situation where the selected target frequency is finally higher than uclamp hints, whereas there are no real needs. Such cases mainly happen because we are currently mixing the traditional scheduler utilization signal with the uclamp performance hints. By adding these 2 metrics, we loose an important information when it comes to select the target frequency, and we have to make some assumptions which can't fit all cases. Rework the interface between the scheduler and schedutil governor in order to propagate all information down to the cpufreq governor. effective_cpu_util() interface changes and now returns the actual utilization of the CPU with 2 optional inputs: - The minimum performance for this CPU; typically the capacity to handle the deadline task and the interrupt pressure. But also uclamp_min request when available. - The maximum targeting performance for this CPU which reflects the maximum level that we would like to not exceed. By default it will be the CPU capacity but can be reduced because of some performance hints set with uclamp. The value can be lower than actual utilization and/or min performance level. A new sugov_effective_cpu_perf() interface is also available to compute the final performance level that is targeted for the CPU, after applying some cpufreq headroom and taking into account all inputs. With these 2 functions, schedutil is now able to decide when it must go above uclamp hints. It now also has a generic way to get the min performance level. The dependency between energy model and cpufreq governor and its headroom policy doesn't exist anymore. eenv_pd_max_util() asks schedutil for the targeted performance after applying the impact of the waking task. [ mingo: Refined the changelog & C comments. ] Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20231122133904.446032-2-vincent.guittot@linaro.org
2022-07-15PM: EM: convert power field to micro-Watts precision and align driversLukasz Luba1-16/+38
The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13PM: EM: Change the order of arguments in the .active_power() callbackLukasz Luba1-3/+3
The .active_power() callback passes the device pointer when it's called. Aligned with a convetion present in other subsystems and pass the 'dev' as a first argument. It looks more cleaner. Adjust all affected drivers which implement that API callback. Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13PM: EM: Add artificial EM flagPierre Gondois1-0/+5
The Energy Model (EM) can be used on platforms which are missing real power information. Those platforms would implement .get_cost() which populates needed values for the Energy Aware Scheduler (EAS). The EAS doesn't use 'power' fields from EM, but other frameworks might use them. Thus, to avoid miss-usage of this specific type of EM, introduce a new flags which can be checked by other frameworks. Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13PM: EM: Add .get_cost() callbackLukasz Luba1-1/+23
The Energy Model (EM) supports devices which report abstract power scale, not only real Watts. The primary goal for EM is to enable the Energy Aware Scheduler (EAS) for a given platform. Some of the platforms might not be able to deliver proper power values. The only information that they might have is the relative efficiency between CPU types. Thus, it makes sense to remove some restrictions in the EM framework and introduce a mechanism which would support those platforms. What is crucial for EAS to operate is the 'cost' field in the EM. The 'cost' is calculated internally in EM framework based on knowledge from 'power' values. The 'cost' values must be strictly increasing. The existing API with its 'power' value size restrictions cannot guarantee that the 'cost' will meet this requirement. Since the platform is missing this detailed information, but has only efficiency details, introduce a new custom callback in the EM framework. The new callback would allow to provide the 'cost' values which reflect efficiency of the CPUs. This would allow to provide EAS information which has different relation than what would be forced by the EM internal formulas calculating 'cost' values. Thanks to this new callback it is possible to create a system view for EAS which has no overlapping performance states across many Performance Domains. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-03PM: EM: add macro to set .active_power() callback conditionallyLukasz Luba1-0/+2
The Energy Model is able to use new power values coming from DT. Add a new macro which is helpful in setting the .active_power() callback conditionally in setup time. The dual-macro implementation handles both kernel configurations: w/ EM and w/o EM built-in. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-05PM: EM: Allow skipping inefficient statesVincent Donnefort1-6/+37
The new performance domain flag EM_PERF_DOMAIN_SKIP_INEFFICIENCIES allows to not take into account inefficient states when estimating energy consumption. This intends to let the Energy Model know that CPUFreq itself will skip inefficiencies and such states don't need to be part of the estimation anymore. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05PM: EM: Extend em_perf_domain with a flag fieldVincent Donnefort1-3/+10
Merge the current "milliwatts" option into a "flag" field. This intends to prepare the extension of this structure for inefficient states support in the Energy Model. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05PM: EM: Mark inefficient statesVincent Donnefort1-0/+12
Some SoCs, such as the sd855 have OPPs within the same performance domain, whose cost is higher than others with a higher frequency. Even though those OPPs are interesting from a cooling perspective, it makes no sense to use them when the device can run at full capacity. Those OPPs handicap the performance domain, when choosing the most energy-efficient CPU and are wasting energy. They are inefficient. Hence, add support for such OPPs to the Energy Model. The table can now be read skipping inefficient performance states (and by extension, inefficient OPPs). Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-07PM: EM: fix kernel-doc commentsLukasz Luba1-4/+4
Fix the kernel-doc comments for the improved Energy Model documentation. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-06PM: EM: Increase energy calculation precisionLukasz Luba1-0/+16
The Energy Model (EM) provides useful information about device power in each performance state to other subsystems like: Energy Aware Scheduler (EAS). The energy calculation in EAS does arithmetic operation based on the EM em_cpu_energy(). Current implementation of that function uses em_perf_state::cost as a pre-computed cost coefficient equal to: cost = power * max_frequency / frequency. The 'power' is expressed in milli-Watts (or in abstract scale). There are corner cases when the EAS energy calculation for two Performance Domains (PDs) return the same value. The EAS compares these values to choose smaller one. It might happen that this values are equal due to rounding error. In such scenario, we need better resolution, e.g. 1000 times better. To provide this possibility increase the resolution in the em_perf_state::cost for 64-bit architectures. The cost of increasing resolution on 32-bit is pretty high (64-bit division) and is not justified since there are no new 32bit big.LITTLE EAS systems expected which would benefit from this higher resolution. This patch allows to avoid the rounding to milli-Watt errors, which might occur in EAS energy estimation for each PD. The rounding error is common for small tasks which have small utilization value. There are two places in the code where it makes a difference: 1. In the find_energy_efficient_cpu() where we are searching for best_delta. We might suffer there when two PDs return the same result, like in the example below. Scenario: Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There are quite a few small tasks ~10-15 util. These tasks would suffer for the rounding error. These utilization values are typical when running games on Android. One of our partners has reported 5..10mA less battery drain when running with increased resolution. Some details: We have two PDs: PD0 (big) and PD1 (little) Let's compare w/o patch set ('old') and w/ patch set ('new') We are comparing energy w/ task and w/o task placed in the PDs a) 'old' w/o patch set, PD0 task_util = 13 cost = 480 sum_util_w/o_task = 215 sum_util_w_task = 228 scale_cpu = 1024 energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100 energy_w_task = 480 * 228 / 1024 = 106.87 => 106 energy_diff = 106 - 100 = 6 (this is equal to 'old' PD1's energy_diff in 'c)') b) 'new' w/ patch set, PD0 task_util = 13 cost = 480 * 1000 = 480000 sum_util_w/o_task = 215 sum_util_w_task = 228 energy_w/o_task = 480000 * 215 / 1024 = 100781 energy_w_task = 480000 * 228 / 1024 = 106875 energy_diff = 106875 - 100781 = 6094 (this is not equal to 'new' PD1's energy_diff in 'd)') c) 'old' w/o patch set, PD1 task_util = 13 cost = 160 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160 * 283 / 355 = 127.55 => 127 energy_w_task = 160 * 296 / 355 = 133.41 => 133 energy_diff = 133 - 127 = 6 (this is equal to 'old' PD0's energy_diff in 'a)') d) 'new' w/ patch set, PD1 task_util = 13 cost = 160 * 1000 = 160000 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160000 * 283 / 355 = 127549 energy_w_task = 160000 * 296 / 355 = 133408 energy_diff = 133408 - 127549 = 5859 (this is not equal to 'new' PD0's energy_diff in 'b)') 2. Difference in the 6% energy margin filter at the end of find_energy_efficient_cpu(). With this patch the margin comparison also has better resolution, so it's possible to have better task placement thanks to that. Fixes: 27871f7a8a341ef ("PM: Introduce an Energy Model management framework") Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-17sched/cpufreq: Consider reduced CPU capacity in energy calculationLukasz Luba1-3/+13
Energy Aware Scheduling (EAS) needs to predict the decisions made by SchedUtil. The map_util_freq() exists to do that. There are corner cases where the max allowed frequency might be reduced (due to thermal). SchedUtil as a CPUFreq governor, is aware of that but EAS is not. This patch aims to address it. SchedUtil stores the maximum allowed frequency in 'sugov_policy::next_freq' field. EAS has to predict that value, which is the real used frequency. That value is made after a call to cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits. In the existing code EAS is not able to predict that real frequency. This leads to energy estimation errors. To avoid wrong energy estimation in EAS (due to frequency miss prediction) make sure that the step which calculates Performance Domain frequency, is also aware of the allowed CPU capacity. Furthermore, modify map_util_freq() to not extend the frequency value. Instead, use map_util_perf() to extend the util value in both places: SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity. In the end, we achieve the same desirable behavior for both subsystems and alignment in regards to the real CPU frequency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (For the schedutil part) Link: https://lore.kernel.org/r/20210614191238.23224-1-lukasz.luba@arm.com
2020-12-08PM / EM: Micro optimization in em_cpu_energyPavankumar Kondeti1-0/+3
When the sum of the utilization of CPUs in a power domain is zero, return the energy as 0 without doing any computations. Acked-by: Quentin Perret <qperret@google.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10PM: EM: update the comments related to power scaleLukasz Luba1-6/+5
The Energy Model supports power values expressed in milli-Watts or in an 'abstract scale'. Update the related comments is the code to reflect that state. Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10PM: EM: Add a flag indicating units of power values in Energy ModelLukasz Luba1-2/+7
There are different platforms and devices which might use different scale for the power values. Kernel sub-systems might need to check if all Energy Model (EM) devices are using the same scale. Address that issue and store the information inside EM for each device. Thanks to that they can be easily compared and proper action triggered. Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: change name of em_pd_energy to em_cpu_energyLukasz Luba1-3/+8
Energy Model framework now supports other devices than CPUs. Refactor some of the functions in order to prevent wrong usage. The old function em_pd_energy has to generic name. It must not be used without proper cpumask pointer, which is possible only for CPU devices. Thus, rename it and add proper description to warn of potential wrong usage for other devices. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: remove em_register_perf_domainLukasz Luba1-7/+0
Remove old function em_register_perf_domain which is no longer needed. There is em_dev_register_perf_domain that covers old use cases and new as well. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: add support for other devices than CPUs in Energy ModelLukasz Luba1-7/+22
Add support for other devices than CPUs. The registration function does not require a valid cpumask pointer and is ready to handle new devices. Some of the internal structures has been reorganized in order to keep consistent view (like removing per_cpu pd pointers). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: update callback structure and add device pointerLukasz Luba1-7/+8
The Energy Model framework is going to support devices other that CPUs. In order to make this happen change the callback function and add pointer to a device as an argument. Update the related users to use new function and new callback from the Energy Model. Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: introduce em_dev_register_perf_domain functionLukasz Luba1-2/+11
Add now function in the Energy Model framework which is going to support new devices. This function will help in transition and make it smoother. For now it still checks if the cpumask is a valid pointer, which will be removed later when the new structures and infrastructure will be ready. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-06-24PM / EM: change naming convention from 'capacity' to 'performance'Lukasz Luba1-40/+46
The Energy Model uses concept of performance domain and capacity states in order to calculate power used by CPUs. Change naming convention from capacity to performance state would enable wider usage in future, e.g. upcoming support for other devices other than CPUs. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-18energy_model.h: Replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-11-07PM / EM: Declare EM data types unconditionallyQuentin Perret1-2/+1
The structs representing capacity states and performance domains of an Energy Model are currently only defined for CONFIG_ENERGY_MODEL=y. That makes it hard for code outside PM_EM to manipulate those structures without a lot of ifdefery or stubbed accessors. So, move the declaration of the two structs outside of the CONFIG_ENERGY_MODEL ifdef. The client code (e.g. EAS or thermal) always checks the return of em_cpu_get() before using it, so the exising code is still safe to use as-is. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20191030151451.7961-3-qperret@google.com
2019-06-24sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()Vincent Guittot1-1/+1
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is unused since commit: 765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'") Remove it. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: linux@armlinux.org.uk Cc: quentin.perret@arm.com Cc: rafael@kernel.org Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11PM: Introduce an Energy Model management frameworkQuentin Perret1-0/+187
Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>