| .. SPDX-License-Identifier: GPL-2.0 | 
 |  | 
 | ======================= | 
 | Energy Model of devices | 
 | ======================= | 
 |  | 
 | 1. Overview | 
 | ----------- | 
 |  | 
 | The Energy Model (EM) framework serves as an interface between drivers knowing | 
 | the power consumed by devices at various performance levels, and the kernel | 
 | subsystems willing to use that information to make energy-aware decisions. | 
 |  | 
 | The source of the information about the power consumed by devices can vary greatly | 
 | from one platform to another. These power costs can be estimated using | 
 | devicetree data in some cases. In others, the firmware will know better. | 
 | Alternatively, userspace might be best positioned. And so on. In order to avoid | 
 | each and every client subsystem to re-implement support for each and every | 
 | possible source of information on its own, the EM framework intervenes as an | 
 | abstraction layer which standardizes the format of power cost tables in the | 
 | kernel, hence enabling to avoid redundant work. | 
 |  | 
 | The power values might be expressed in micro-Watts or in an 'abstract scale'. | 
 | Multiple subsystems might use the EM and it is up to the system integrator to | 
 | check that the requirements for the power value scale types are met. An example | 
 | can be found in the Energy-Aware Scheduler documentation | 
 | Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or | 
 | powercap power values expressed in an 'abstract scale' might cause issues. | 
 | These subsystems are more interested in estimation of power used in the past, | 
 | thus the real micro-Watts might be needed. An example of these requirements can | 
 | be found in the Intelligent Power Allocation in | 
 | Documentation/driver-api/thermal/power_allocator.rst. | 
 | Kernel subsystems might implement automatic detection to check whether EM | 
 | registered devices have inconsistent scale (based on EM internal flag). | 
 | Important thing to keep in mind is that when the power values are expressed in | 
 | an 'abstract scale' deriving real energy in micro-Joules would not be possible. | 
 |  | 
 | The figure below depicts an example of drivers (Arm-specific here, but the | 
 | approach is applicable to any architecture) providing power costs to the EM | 
 | framework, and interested clients reading the data from it:: | 
 |  | 
 |        +---------------+  +-----------------+  +---------------+ | 
 |        | Thermal (IPA) |  | Scheduler (EAS) |  |     Other     | | 
 |        +---------------+  +-----------------+  +---------------+ | 
 |                |                   | em_cpu_energy()   | | 
 |                |                   | em_cpu_get()      | | 
 |                +---------+         |         +---------+ | 
 |                          |         |         | | 
 |                          v         v         v | 
 |                         +---------------------+ | 
 |                         |    Energy Model     | | 
 |                         |     Framework       | | 
 |                         +---------------------+ | 
 |                            ^       ^       ^ | 
 |                            |       |       | em_dev_register_perf_domain() | 
 |                 +----------+       |       +---------+ | 
 |                 |                  |                 | | 
 |         +---------------+  +---------------+  +--------------+ | 
 |         |  cpufreq-dt   |  |   arm_scmi    |  |    Other     | | 
 |         +---------------+  +---------------+  +--------------+ | 
 |                 ^                  ^                 ^ | 
 |                 |                  |                 | | 
 |         +--------------+   +---------------+  +--------------+ | 
 |         | Device Tree  |   |   Firmware    |  |      ?       | | 
 |         +--------------+   +---------------+  +--------------+ | 
 |  | 
 | In case of CPU devices the EM framework manages power cost tables per | 
 | 'performance domain' in the system. A performance domain is a group of CPUs | 
 | whose performance is scaled together. Performance domains generally have a | 
 | 1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are | 
 | required to have the same micro-architecture. CPUs in different performance | 
 | domains can have different micro-architectures. | 
 |  | 
 |  | 
 | 2. Core APIs | 
 | ------------ | 
 |  | 
 | 2.1 Config options | 
 | ^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | CONFIG_ENERGY_MODEL must be enabled to use the EM framework. | 
 |  | 
 |  | 
 | 2.2 Registration of performance domains | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | Registration of 'advanced' EM | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | The 'advanced' EM gets its name due to the fact that the driver is allowed | 
 | to provide more precised power model. It's not limited to some implemented math | 
 | formula in the framework (like it is in 'simple' EM case). It can better reflect | 
 | the real power measurements performed for each performance state. Thus, this | 
 | registration method should be preferred in case considering EM static power | 
 | (leakage) is important. | 
 |  | 
 | Drivers are expected to register performance domains into the EM framework by | 
 | calling the following API:: | 
 |  | 
 |   int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, | 
 | 		struct em_data_callback *cb, cpumask_t *cpus, bool microwatts); | 
 |  | 
 | Drivers must provide a callback function returning <frequency, power> tuples | 
 | for each performance state. The callback function provided by the driver is free | 
 | to fetch data from any relevant location (DT, firmware, ...), and by any mean | 
 | deemed necessary. Only for CPU devices, drivers must specify the CPUs of the | 
 | performance domains using cpumask. For other devices than CPUs the last | 
 | argument must be set to NULL. | 
 | The last argument 'microwatts' is important to set with correct value. Kernel | 
 | subsystems which use EM might rely on this flag to check if all EM devices use | 
 | the same scale. If there are different scales, these subsystems might decide | 
 | to return warning/error, stop working or panic. | 
 | See Section 3. for an example of driver implementing this | 
 | callback, or Section 2.4 for further documentation on this API | 
 |  | 
 | Registration of EM using DT | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | The  EM can also be registered using OPP framework and information in DT | 
 | "operating-points-v2". Each OPP entry in DT can be extended with a property | 
 | "opp-microwatt" containing micro-Watts power value. This OPP DT property | 
 | allows a platform to register EM power values which are reflecting total power | 
 | (static + dynamic). These power values might be coming directly from | 
 | experiments and measurements. | 
 |  | 
 | Registration of 'artificial' EM | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | There is an option to provide a custom callback for drivers missing detailed | 
 | knowledge about power value for each performance state. The callback | 
 | .get_cost() is optional and provides the 'cost' values used by the EAS. | 
 | This is useful for platforms that only provide information on relative | 
 | efficiency between CPU types, where one could use the information to | 
 | create an abstract power model. But even an abstract power model can | 
 | sometimes be hard to fit in, given the input power value size restrictions. | 
 | The .get_cost() allows to provide the 'cost' values which reflect the | 
 | efficiency of the CPUs. This would allow to provide EAS information which | 
 | has different relation than what would be forced by the EM internal | 
 | formulas calculating 'cost' values. To register an EM for such platform, the | 
 | driver must set the flag 'microwatts' to 0, provide .get_power() callback | 
 | and provide .get_cost() callback. The EM framework would handle such platform | 
 | properly during registration. A flag EM_PERF_DOMAIN_ARTIFICIAL is set for such | 
 | platform. Special care should be taken by other frameworks which are using EM | 
 | to test and treat this flag properly. | 
 |  | 
 | Registration of 'simple' EM | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | The 'simple' EM is registered using the framework helper function | 
 | cpufreq_register_em_with_opp(). It implements a power model which is tight to | 
 | math formula:: | 
 |  | 
 | 	Power = C * V^2 * f | 
 |  | 
 | The EM which is registered using this method might not reflect correctly the | 
 | physics of a real device, e.g. when static power (leakage) is important. | 
 |  | 
 |  | 
 | 2.3 Accessing performance domains | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | There are two API functions which provide the access to the energy model: | 
 | em_cpu_get() which takes CPU id as an argument and em_pd_get() with device | 
 | pointer as an argument. It depends on the subsystem which interface it is | 
 | going to use, but in case of CPU devices both functions return the same | 
 | performance domain. | 
 |  | 
 | Subsystems interested in the energy model of a CPU can retrieve it using the | 
 | em_cpu_get() API. The energy model tables are allocated once upon creation of | 
 | the performance domains, and kept in memory untouched. | 
 |  | 
 | The energy consumed by a performance domain can be estimated using the | 
 | em_cpu_energy() API. The estimation is performed assuming that the schedutil | 
 | CPUfreq governor is in use in case of CPU device. Currently this calculation is | 
 | not provided for other type of devices. | 
 |  | 
 | More details about the above APIs can be found in ``<linux/energy_model.h>`` | 
 | or in Section 2.4 | 
 |  | 
 |  | 
 | 2.4 Description details of this API | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | .. kernel-doc:: include/linux/energy_model.h | 
 |    :internal: | 
 |  | 
 | .. kernel-doc:: kernel/power/energy_model.c | 
 |    :export: | 
 |  | 
 |  | 
 | 3. Example driver | 
 | ----------------- | 
 |  | 
 | The CPUFreq framework supports dedicated callback for registering | 
 | the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). | 
 | That callback has to be implemented properly for a given driver, | 
 | because the framework would call it at the right time during setup. | 
 | This section provides a simple example of a CPUFreq driver registering a | 
 | performance domain in the Energy Model framework using the (fake) 'foo' | 
 | protocol. The driver implements an est_power() function to be provided to the | 
 | EM framework:: | 
 |  | 
 |   -> drivers/cpufreq/foo_cpufreq.c | 
 |  | 
 |   01	static int est_power(struct device *dev, unsigned long *mW, | 
 |   02			unsigned long *KHz) | 
 |   03	{ | 
 |   04		long freq, power; | 
 |   05 | 
 |   06		/* Use the 'foo' protocol to ceil the frequency */ | 
 |   07		freq = foo_get_freq_ceil(dev, *KHz); | 
 |   08		if (freq < 0); | 
 |   09			return freq; | 
 |   10 | 
 |   11		/* Estimate the power cost for the dev at the relevant freq. */ | 
 |   12		power = foo_estimate_power(dev, freq); | 
 |   13		if (power < 0); | 
 |   14			return power; | 
 |   15 | 
 |   16		/* Return the values to the EM framework */ | 
 |   17		*mW = power; | 
 |   18		*KHz = freq; | 
 |   19 | 
 |   20		return 0; | 
 |   21	} | 
 |   22 | 
 |   23	static void foo_cpufreq_register_em(struct cpufreq_policy *policy) | 
 |   24	{ | 
 |   25		struct em_data_callback em_cb = EM_DATA_CB(est_power); | 
 |   26		struct device *cpu_dev; | 
 |   27		int nr_opp; | 
 |   28 | 
 |   29		cpu_dev = get_cpu_device(cpumask_first(policy->cpus)); | 
 |   30 | 
 |   31     	/* Find the number of OPPs for this policy */ | 
 |   32     	nr_opp = foo_get_nr_opp(policy); | 
 |   33 | 
 |   34     	/* And register the new performance domain */ | 
 |   35     	em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus, | 
 |   36					    true); | 
 |   37	} | 
 |   38 | 
 |   39	static struct cpufreq_driver foo_cpufreq_driver = { | 
 |   40		.register_em = foo_cpufreq_register_em, | 
 |   41	}; |