Re: Leaving the Desktop Market

From: Allan Jude <freebsd_at_allanjude.com>
Date: Mon, 12 May 2014 22:09:12 -0400
On 2014-05-12 14:25, Adrian Chadd wrote:
> On 12 May 2014 10:35, Allan Jude <freebsd_at_allanjude.com> wrote:
>> I have this system:
>>
>> hw.model: Intel(R) Xeon(R) CPU E3-1220 v3 _at_ 3.10GHz
>> hw.ncpu: 4
>>
>> http://ark.intel.com/products/75052
>>
>> dev.cpu.0.%desc: ACPI CPU
>> dev.cpu.0.%driver: cpu
>> dev.cpu.0.%location: handle=\_PR_.CPU0
>> dev.cpu.0.%pnpinfo: _HID=none _UID=0
>> dev.cpu.0.%parent: acpi0
>> dev.cpu.0.freq: 3100
>> dev.cpu.0.freq_levels: 3101/80000 3100/80000 2900/72713 2800/69558
>> 2600/62669 2400/56794 2300/53935 2100/47673 1900/42370 1800/39795
>> 1600/34136 1500/31729 1300/26432 1137/23128 1100/21994 1000/19851
>> 875/17369 800/15113 700/13223 600/11334 500/9445 400/7556 300/5667
>> 200/3778 100/1889
>> dev.cpu.0.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.0.cx_lowest: C8
>> dev.cpu.0.cx_usage: 9.01% 90.98% last 807us
>> dev.cpu.1.%desc: ACPI CPU
>> dev.cpu.1.%driver: cpu
>> dev.cpu.1.%location: handle=\_PR_.CPU1
>> dev.cpu.1.%pnpinfo: _HID=none _UID=0
>> dev.cpu.1.%parent: acpi0
>> dev.cpu.1.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.1.cx_lowest: C8
>> dev.cpu.1.cx_usage: 11.70% 88.29% last 21303us
>> dev.cpu.2.%desc: ACPI CPU
>> dev.cpu.2.%driver: cpu
>> dev.cpu.2.%location: handle=\_PR_.CPU2
>> dev.cpu.2.%pnpinfo: _HID=none _UID=0
>> dev.cpu.2.%parent: acpi0
>> dev.cpu.2.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.2.cx_lowest: C8
>> dev.cpu.2.cx_usage: 15.17% 84.82% last 22987us
>> dev.cpu.3.%desc: ACPI CPU
>> dev.cpu.3.%driver: cpu
>> dev.cpu.3.%location: handle=\_PR_.CPU3
>> dev.cpu.3.%pnpinfo: _HID=none _UID=0
>> dev.cpu.3.%parent: acpi0
>> dev.cpu.3.cx_supported: C1/1/1 C2/2/148
>> dev.cpu.3.cx_lowest: C8
>> dev.cpu.3.cx_usage: 11.74% 88.25% last 6073us
>>
> So ACPI is exposing C1 and C2 only.
>
>> According to the Intel specs (Page 11), this processor supports C1, C1E,
>> C3, C6 and C7
>>
>> The above sysctl dump shows only C1 and C2. I wonder if the C2 is
>> actually C3
>>
>> http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-1-datasheet.pdf
> It'd say C2/3/xxx in that case.
>
> Chances are you'll end up seeing it fall into deeper sleep states. Try
> installing intel-pcm; kldload cpuctl; run pcm.x 1 . See if it's
> entering lower CPU states.
>
>> How is our support for the newer Cx States introduced in Haswell, which
>> can apparently go as high as C10
> I don't know if we get those exposed via ACPI. I know there's a bunch
> of cute things we could be doing with MWAIT that we aren't, but we
> certainly should be drifting into lower sleep states.
>
> Just run intel-pcm and see.
>
> Thanks,
>
>
>
> -a


stock configuration:

# pcm.x 10

 Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100
ID=db05e43)

 Copyright (c) 2009-2013 Intel Corporation

Number of physical cores: 4
Number of logical cores: 4
Threads (logical cores) per physical core: 1
Num sockets: 1
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3100000000 Hz
Package thermal spec power: 80 Watt; Package minimum power: 0 Watt;
Package maximum power: 0 Watt;

Detected Intel(R) Xeon(R) CPU E3-1220 v3 _at_ 3.10GHz "Intel(r)
microarchitecture codename Haswell"

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock
ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in
power-saving C state)='unhalted clock ticks'/'invariant timer ticks
while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses
 L2MISS: L2 cache misses (including other core's L2 cache *hits*)
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in
some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still
hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax
temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT |
L2HIT | L3CLK | L2CLK  | READ  | WRITE | TEMP

   0    0     0.00   0.74   0.00    0.76      39 K    120 K    0.67   
0.66    0.13    0.09     N/A     N/A     74
   1    0     0.00   0.72   0.00    0.75      17 K     80 K    0.79   
0.71    0.07    0.10     N/A     N/A     76
   2    0     0.00   0.62   0.00    0.61    8037       33 K    0.76   
0.58    0.08    0.08     N/A     N/A     76
   3    0     0.00   0.72   0.00    0.76      18 K     98 K    0.81   
0.70    0.07    0.10     N/A     N/A     76
-------------------------------------------------------------------------------------------------------------------
 SKT    0     0.00   0.72   0.00    0.74      83 K    332 K    0.75   
0.68    0.09    0.09    0.38    0.01     74
-------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.00   0.72   0.00    0.74      83 K    332 K    0.75   
0.68    0.09    0.09    0.38    0.01     N/A

 Instructions retired:  118 M ; Active cycles:  165 M ; Time (TSC):   30
Gticks ; C0 (active,non-halted) core residency: 0.18 %

 C1 core residency: 99.82 %; C3 core residency: 0.00 %; C6 core
residency: 0.00 %; C7 core residency: 0.00 %;
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package
residency: 0.00 %; C7 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 0.72 => corresponds to 17.93 %
utilization for cores in active state
 Instructions per nominal CPU cycle: 0.00 => corresponds to 0.02 % core
utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 68.42 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    68.42 Joules



Then just enabling the higher Cx states (no powerd or anything):


# sysctl hw.acpi.cpu.cx_lowest=c8
hw.acpi.cpu.cx_lowest: C1 -> C8
# pcm.x 10

 Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100
ID=db05e43)

 Copyright (c) 2009-2013 Intel Corporation

Number of physical cores: 4
Number of logical cores: 4
Threads (logical cores) per physical core: 1
Num sockets: 1
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 3100000000 Hz
Package thermal spec power: 80 Watt; Package minimum power: 0 Watt;
Package maximum power: 0 Watt;

Detected Intel(R) Xeon(R) CPU E3-1220 v3 _at_ 3.10GHz "Intel(r)
microarchitecture codename Haswell"

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock
ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in
power-saving C state)='unhalted clock ticks'/'invariant timer ticks
while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses
 L2MISS: L2 cache misses (including other core's L2 cache *hits*)
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in
some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still
hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax
temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT |
L2HIT | L3CLK | L2CLK  | READ  | WRITE | TEMP

   0    0     0.00   0.11   0.00    0.99     611 K    629 K    0.03   
0.02    1.89    0.01     N/A     N/A     73
   1    0     0.00   0.19   0.00    0.99     152 K    169 K    0.10   
0.04    1.36    0.04     N/A     N/A     76
   2    0     0.00   0.20   0.00    0.99     153 K    171 K    0.10   
0.04    1.29    0.04     N/A     N/A     77
   3    0     0.00   0.20   0.00    1.00     159 K    180 K    0.12   
0.04    1.14    0.03     N/A     N/A     76
-------------------------------------------------------------------------------------------------------------------
 SKT    0     0.00   0.16   0.00    0.99    1077 K   1150 K    0.06   
0.03    1.55    0.03    0.14    0.01     72
-------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.00   0.16   0.00    0.99    1077 K   1150 K    0.06   
0.03    1.55    0.03    0.14    0.01     N/A

 Instructions retired:   19 M ; Active cycles:  125 M ; Time (TSC):   31
Gticks ; C0 (active,non-halted) core residency: 0.10 %

 C1 core residency: 1.85 %; C3 core residency: 0.00 %; C6 core
residency: 0.00 %; C7 core residency: 98.05 %;
 C2 package residency: 8.10 %; C3 package residency: 7.46 %; C6 package
residency: 79.20 %; C7 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 0.16 => corresponds to 3.96 %
utilization for cores in active state
 Instructions per nominal CPU cycle: 0.00 => corresponds to 0.00 % core
utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 22.77 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    22.77 Joules





Suggest it is spending lots of time in C6 and C7

Will try to grab results from a few more machines
Received on Tue May 13 2014 - 00:09:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:49 UTC