Power Management: So what is this policy thing?

Unlike a lot of previous recent blogs, this series is about power management in general. At the very end of the series, I’ll write specifically about the Intel® Xeon Phi™ coprocessor.

I have talked incessantly over the years about power states (e.g. P-states and C-states), and how the processor transitions from one state to another. For a list of previous blogs in this series, and well as other related blogs on power and power management, see the article at [List0]. But I have left out an important component of power management, namely the policy.

A policy is a collection of rules used for guidance, for example, a security policy. A power management policy contains the rules / logic that guide power management state transitions. The implementation of that policy is done by the power management (PM) manager or module.

One way to divide power management functions is between 5 domains: hardware, BIOS or nearly BIOS level drivers, kernel level drivers (ring 0), system power management controls (ring 3), and user power management controls (ring 3). This arrangement can differ depending upon the OS and technology being used (e.g. mobile vs. server). See Figure PWRMNGR.

Latencies drive this distribution of power management functionality. Power Management can only work if its impact on executing applications is trivial. Latency is not so important for transitions into an idle state – the processor is not doing anything or it would not be transitioning into the idle state in the first place. In contrast, transitions out of an idle state and into the run state must take place as quickly as possible. So the designers of the power management infrastructure distribute its functionality across the OS, hardware, and user levels. The lowest layers must be simple and react as quickly as possible when transitioning from the idle state to the run state (e.g. from C1 to C0). As an example, transitions from C1 to C0 are less than a microsecond for the Intel® Xeon Phi™ coprocessor. As we look at higher layers of the power management stack, the transitions they govern are more latency tolerant and can involve more complex decision making logic.

As an interesting aside, the entire power management stack does not have to be running on the system being managed. The current generation (as of 2014) Intel® Xeon Phi™ coprocessor necessarily has part of the power management logic implemented on the host. I will discuss this further below. (This will likely change in future generations of the coprocessor.)

Figure PWRMNGR. The power management module and the power management policy

In the Hardware and BIOS: At these very lowest levels, power management is limited to mapping power management instructions to the underlying hardware, such as calls to invoke different P and C-states. See Alex Hung’s power management blogs for a good description of the BIOS mapping of HW power management functionality to ACPI definitions in reference section below[. Given its simplicity, this level introduces no perceptible latency to an executing user application.

In the Kernel (ring 0): Ultimately, power management decisions involve transitions between run and different idle states, and such decisions introduce latencies. For example, if a processor is in C3 and an interrupt occurs, it must transition from C3 to C0; run the interrupt routine, and then transition back to C3. But as in all things, it is not this simple. These transitions also involve software logic and decision making, such as determining whether the processor should instead use a higher idle state with less latency such as C1. It does not make any sense to have this decision making logic at the BIOS level as many repeated transitions can result in non-trivial cumulative latency (as well as violating good programming practice).

Typical kernel level power management involves functionality where latency is critical but involves some computation and decision making. This decision making takes place in ring 0 (kernel) which can avoid the latencies inherent in ring 3 context switches and other OS overhead. At this level, statistics are also collected to help the power management software better predict transitions, such as when future interrupts will occur.

In the OS (ring 3): Power management functionality at this level takes more time and becomes involved only when necessary or when minimizing latency is not as critical. An example might be adjusting policy based upon collected interrupt frequency and duration statistics. Another example might be the decisions involving P-state transitions. Such transitions do not involve any state saving and restoration. As such, its decision making can take place at a higher level and at a more leisurely pace in the power management stack.

In User Space (ring 3): This is where policy is set and initialized. At this high level, latency is much less of an issue with some rare exceptions.

One such rare exception is seen in the Intel® Xeon Phi™ coprocessor where the host necessarily becomes involved in some power state transitions. This is because when the coprocessor is in a package C-state, it is all but powered down; no power management software can run on the coprocessor when it is in a package idle state (PC-3 and PC-6). The host must wake the coprocessor up, essentially performing a fast boot up. This means that part of the coprocessor’s power management stack is executing on the host (i.e. remotely). As such, transitions from the deepest package idle state (PC6) to C0 can get close to 500 milliseconds⁺. See my article on power states referenced below.

In the next blog, we will look briefly at different power management policies.

REFERENCES

NOTE: As previously in my blogs, any illustrations can be blamed solely on me as no copyright has been infringed or artistic ability shown.

[List0] Kidd, Taylor (10/23/2013), List of Useful Power and Power Management Articles, Blogs and References, http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references. Retrieved February 21^st, 2014.

⁺There are state diagrams that detail these changes and the conditions for them. Introducing these diagrams, as well as the kernel level power management APIs, is at a level of depth that is inappropriate for this article. If you have an unquenchable desire to know, they can often be found in processor data sheets or software developer’s guides.

Power Management: So what is this policy thing?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112