Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references.
INTRODUCTIONS: TEMPERATURE SENSORS AND THE COPROCESSOR
Figure SENSORS is borrowed from the Intel® Xeon Phi™ Coprocessor Datasheet, dated June 2013. This image shows the location of various sensors and components on the coprocessor’s printed circuit board (PCB).
Figure SENSORS. The front and back of a representative coprocessor printed circuit board showing the position of thermal sensors and major components. |
The below descriptions of the various micsmc command line options discuss the purpose and interpretation of the data obtained from these sensors. Since the coprocessor has both passive and active cooling SKUs, there are inlet, outlet and fan related sensors. These sensors exist on both types of coprocessors but their meaning only applies to the active versions. In passive versions, the meaning and usefulness of the sensors is going to depend upon the cooling provided by the housing of the host containing the coprocessors.
COMMAND LINE USAGE: MEASURING POWER
Not unexpectedly, you can do everything that can be done using the graphical tool and more by using the command line version. For a full list of commands, see Table FULL. Table POWER below that shows the most relevant options related to power.
twkidd@knightscorner1:~> micsmc --help Intel(R) Xeon Phi(TM) Coprocessor Platform Status Panel VERSION: 3.1-0.1.build0 Developed by Intel Corporation. Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S and/or other countries. This application monitors device performance, including driver info, temperatures, core usage, etc. This program is based in part on the work of the Qwt project (http://qwt.sf.net). The Status Panel User Guide is available in all supported languages, in PDF and HTML formats, at: "/opt/intel/mic/sysmgmt/docs"+ USAGE: ====== -a, --all [[device] <device_list>] Displays all/selected device status data. Equivalent to: -i -t -f -m -c. -c, --cores [[device] <device_list>] Displays the average and per core utilization levels for all/selected devices. -f, --freq [[device] <device_list>] Displays the clock frequency and power levels for all/selected devices. -i, --info [[device] <device_list>] Displays general system information for all/selected devices. -l, --lost Displays all Intel(R) Xeon Phi(TM) Coprocessors in the system and whether they are currently in the Lost Node condition. --online Displays all Intel(R) Xeon Phi(TM) Coprocessors in the system that are currently online. --offline Displays all Intel(R) Xeon Phi(TM) Coprocessors in the system that are currently offline, lost, or otherwise unavailable. -m, --mem [[device] <device_list>] Displays the memory utilization data for all/selected devices. -t, --temp [[device] <device_list>] Displays the temperature levels for all/selected devices. --ecc [status | enable | disable] [[device] <device_list>] Optional arguments: enable - enables ECC Mode disable - disables ECC Mode status - displays the ECC Mode Enables, disables or displays the ECC Mode for all/selected devices. NOTE: If no arguments are provided, status is displayed. --turbo [status | enable | disable] [[device] <device_list>] Optional arguments: enable - enables Turbo Mode disable - disables Turbo Mode status - displays Turbo Mode status Enables, disables or displays the Turbo Mode for all/selected devices. NOTE: If no arguments are provided, status is displayed. --led [status | enable | disable] [[device] <device_list>] Optional arguments: enable - enables LED Alert disable - disables LED Alert status - displays LED Alert status Enables, disables or displays the LED Alert for all/selected devices. NOTE: If no arguments are provided, status is displayed. --pthrottle [[device] <device_list>] Displays the Power Throttle State for all/selected devices. --tthrottle [[device] <device_list>] Displays the Thermal Throttle State for all/selected devices. --pwrenable [cpufreq | corec6 | pc3 | pc6 | all] [[device] <device_list>] Optional arguments: cpufreq - enables the cpufreq power management feature corec6 - enables the corec6 power management feature pc3 - enables the pc3 power management feature pc6 - enables the pc6 power management feature all - enables all four power management features Enables/disables the Power Management Features for all/selected devices. NOTE: Each feature not specified will automatically be disabled. If no features are specified, then all Power Management Features are disabled. --pwrstatus [[device] <device_list>] Displays the Power Management Feature status for all/selected devices. --timeout <value> Required argument: value - integer timeout value in seconds. Sets the sub-process timeout value for the current invocation. Affects only command option(s) requiring sub-process execution. -h, --help [<options_list>] Displays full/selected usage information and then exits. -v, --version Displays the tool version and then exits. ======================================= Common Argument: [[device] device_list] Specifies the device name arguments for a given command option. The 'device_list' specifies one or more 'micN' values where 'N' is the device number: 'mic2 mic5 ...' When no device names are specified, the option operates on all devices in the system. twkidd@knightscorner1:~> |
Table FULL. Full list of micsmc command line options. |
-f, --freq [[device] <device_list>] Displays the clock frequency and power levels for all/selected devices. -t, --temp [[device] <device_list>] Displays the temperature levels for all/selected devices. --turbo [status | enable | disable] [[device] <device_list>] Optional arguments: enable - enables Turbo Mode disable - disables Turbo Mode status - displays Turbo Mode status Enables, disables or displays the Turbo Mode for all/selected devices. NOTE: If no arguments are provided, status is displayed. --pthrottle [[device] <device_list>] Displays the Power Throttle State for all/selected devices. --tthrottle [[device] <device_list>] Displays the Thermal Throttle State for all/selected devices. --pwrenable [cpufreq | corec6 | pc3 | pc6 | all] [[device] <device_list>] Optional arguments: cpufreq - enables the cpufreq power management feature corec6 - enables the corec6 power management feature pc3 - enables the pc3 power management feature pc6 - enables the pc6 power management feature all - enables all four power management features Enables/disables the Power Management Features for all/selected devices. NOTE: Each feature not specified will automatically be disabled. If no features are specified, then all Power Management Features are disabled. --pwrstatus [[device] <device_list>] Displays the Power Management Feature status for all/selected devices. |
Table POWER. List of micsmc options most relevant to power management. |
As is often the case, the documentation on many of these is pretty sparse. This is not a criticism, just an acceptance of the fact that there are always fires that need stamping out.
DETAILS FOR THE OPERANDS THAT APPLY SPECIFICALLY TO POWER.
“--freq”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc --freq mic0 mic0 (freq): Core Frequency: .......... 1.10 GHz Total Power: ............. 107.00 Watts Low Power Limit: ......... 315.00 Watts High Power Limit: ........ 375.00 Watts Physical Power Limit: .... 395.00 Watts twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US>
Measurement | Definition |
Core Frequency | |
Total Power | Rate of energy usage (Joules/sec, aka Watts) |
Low Power Limit | Above which PM initiates basic cooling activities such as increasing the fan speed. |
High Power Limit | Above which PM performs aggressive cooling activities such as throttling the cores and maximizing fan speed |
Physical Power Limit | Also called the “shutdown limit”. Above this limit, the PM starts shutting down the coprocessor. A warning may precede this shutdown. |
Table FREQ: Explanation of the output of “micsmc --freq” |
“--temp”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc --temp mic0 mic0 (temp): Cpu Temp: ................ 49.00 C Memory Temp: ............. 36.00 C Fan-In Temp: ............. 30.00 C Fan-Out Temp: ............ 36.00 C Core Rail Temp: .......... 35.00 C Uncore Rail Temp: ........ 36.00 C Memory Rail Temp: ........ 36.00 C twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US>
Measurement | Definition |
CPU Temperature | For the die |
Memory Temperature | For memory |
Fan-In Temperature | For the fan inlet sensor for an active coprocessor. |
Fan-Out Temperature | For the fan outlet sensor for an active coprocessor. |
Core Rail Temp | For the power rail feeding the coprocessor chip |
Uncore Rail Temp | For the power rail feeding all other circuitry except memory |
Memory Rail Temp | For the power rail feeding memory |
Table TEMP. Explanation of the output of “micsmc --temp” |
“--pthrottle”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc --pthrottle mic0 mic0 (pthrottle): Throttle state: ......... inactive Current throttle time: .. 0 msec Throttle event count: ... 0 Total throttle time: .... 0 msec twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US>
Measurement | Explanation |
Throttle state | Indicates if the processor is being power throttled; power throttling is done when coprocessor power exceeds a certain threshold. |
Current throttle time | If power throttled, how long has it been so |
Throttle event count | # of times throttled over current interval |
Total throttle time | Total time throttled over current interval |
Table PTHROTTLE. Explanation of the output of “micsmc --pthrottle” |
“--tthrottle”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc --tthrottle mic0 mic0 (tthrottle): Throttle state: ......... inactive Current throttle time: .. 0 msec Throttle event count: ... 0 Total throttle time: .... 0 msec twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US>
Measurement | Explanation |
Throttle state | Indicates if the processor is being thermally throttled; thermal throttling is done when the coprocessor die temperature exceeds a certain threshold. |
Current throttle time | If thermally throttled, how long has it been |
Throttle event count | # of times throttled over current interval |
Total throttle time | Total time throttled over current interval |
Table TTHROTTLE. Explanation of the output of “micsmc --tthrottle” |
“--pwrstatus”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc --pwrstatus mic0 mic0 (pwrstatus): cpufreq power management feature: .. enabled corec6 power management feature: ... enabled pc3 power management feature: ...... enabled pc6 power management feature: ...... enabled twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US>
Measurement | Explanation |
cpufreq | Enabled/disabled status of P-states |
corec6 | Enabled/disabled status of Core C6 |
pc3 | Enabled/disabled status of Package C3 |
pc6 | Enabled/disabled status of Package C6 |
Table STATUS. Explanation of the output of “micsmc --pwrstatus” |
COMMAND LINE USAGE: CONFIGURING POWER
“--turbo”
twkidd@knightscorner5:~> micsmc --turbo status mic0 (turbo): Turbo mode is enabled mic1 (turbo): Turbo mode is disabled twkidd@knightscorner5:~>
Measurement | Definition |
Turbo mode | Indicates if it is enabled, disabled or not supported |
Table TURBO. Explanation of the output of “micsmc --turbo status” |
Here is an important note: To enable or disable turbo, you do not have to reboot / restart the card.
“--pwrenable”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc –pwrenable [cpufreq | corec6 | pc3 | pc6 | all] [[device] <device_list>]
Setting | Explanation |
cpufreq | Enables the use of P-states |
corec6 | Enable the cores to drop into Core C6 |
pc3 | Enable the specified coprocessor to enter package C-state pc3 |
pc6 | Enable the specified coprocessor to enter package C-state pc6 (the lowest possible idle state)++ |
all | Obvious |
Table ENABLE. Explanation of the options of “micsmc --pwrenable” |
“--all”
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc –-all [[device] <device_list>]
This operand is equivalent to specifying “--freq --temp --info --mem –cores” for the specified devices.
ERROR MESSAGES
If you do not have permissions, you will likely get the message, “Error: mic0: unable to set power management configuration: unable to open configuration file: /etc/mpss/mic0.conf”. Also notice that this command setups the coprocessor boot configuration files “micn.conf”, where n is the coprocessor number. Intel® Manycore Platform Software Stack (MPSS) can only implement any changes you specify upon rebooting the specified coprocessor. (This does not include turbo.)
Only certain SKUs have support for turbo. If your card does not, you will get an error message similar to the following.
twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US> micsmc --turbo status Warning: mic0: Turbo mode not supported by this device: Device ID: 0x225d, stepping: 0x2, substepping: 0x0 Warning: mic1: Turbo mode not supported by this device: Device ID: 0x225d, stepping: 0x2, substepping: 0x0 twkidd@knightscorner1:/usr/share/doc/sysmgmt/en_US>
+The documentation is in error as of January 2014. This directory should be /usr/share/doc/sysmgmt. Hopefully this has been corrected by the time you, my humble reader, have read this blog.
++Some of the earliest SKUs do not have PC6 capability