Re: powerd and nvidia drivers not playing nicely together (Was: Re: Systems running hot?)

From: Bernd Walter <ticso_at_cicely7.cicely.de>
Date: Thu, 24 Dec 2009 22:44:57 +0100
On Thu, Dec 24, 2009 at 12:22:23PM -0800, Kevin Oberman wrote:
> > Date: Thu, 24 Dec 2009 18:48:10 +0100
> > From: Bernd Walter <ticso_at_cicely7.cicely.de>
> > 
> > On Thu, Dec 24, 2009 at 08:24:12AM -0800, Kevin Oberman wrote:
> > > > Date: Thu, 24 Dec 2009 11:46:26 +0100
> > > > From: Bernd Walter <ticso_at_cicely7.cicely.de>
> > > > Sender: owner-freebsd-current_at_freebsd.org
> > > > 
> > > > On Wed, Dec 23, 2009 at 04:44:35PM +0200, Gleb Kurtsou wrote:
> > > > > On (21/12/2009 19:18), Doug Barton wrote:
> > > > > > b. f. wrote:
> > > > > > > On 12/21/09, Doug Barton <dougb_at_freebsd.org> wrote:
> > > > > > >> b. f. wrote:
> > > > > > >>>> no X! So I think to myself, what else did I change last night.... oh
> > > > > > > 
> > > > > > >>> acpi_perf? acpi_throttle? acpi_thermal? acpi_video?
> > > > > > >> I haven't done anything special with the acpi stuff. The only thing
> > > > > > >> that looks relevant from dmesg is: acpi_tz0: <Thermal Zone> on acpi0
> > > > > > >>
> > > > > > > 
> > > > > > > Yes, but which components show up in 'sysctl -a | grep -ie acpi' ?
> > > > > > 
> > > > > > It's a long list, but here you go:
> > > > > > http://people.freebsd.org/~dougb/acpi-grep.txt
> > > > > > 
> > > > > > >>> Which nvidia driver?
> > > > > > >> The latest.
> > > > > > > 
> > > > > > > Which video card?
> > > > > > 
> > > > > > nvidia0: <GeForce Go 7300>
> > > > > I had similar problems with GeForce 8400M. GPU temperature could get up
> > > > > to 100C in X, which increased CPU temperature in its turn.  I use
> > > > > powerd, and had lockups with *_cx_lowest settings. I run amd64, i386 was
> > > > > just fine on the same notebook. 
> > > > 
> > > > It is not just nvidia.
> > > > I'm using two plain old PCI Matrox G400 and whenever I start X with
> > > > powerd enabled I have a full freeze within 24 hours.
> > > > It doesn't seem to be a problem to start powerd once X is runnning.
> > > > Maybe it is something like tuning some delay loop with reduced clock
> > > > rate, which then isn't long enough with increased speed.
> > > 
> > > Quick question...are you using throttling/TCC? If so, either turn it off
> > > or limit how low it can run the CPU. When I was running throttling on
> > > systems with old Matrox and Radeon cards, they would freeze if the
> > > throttling went too low.
> > 
> > I asume yes - not sure about all those modern fancy names.
> > In other words dev.cpu.?.freq changes.
> > 
> > > As mav pointed out at http://wiki.freebsd.org/TuningPowerConsumption,
> > > TCC does little to conserve power and was not designed for that. TCC is
> > > Thermal Control Circuit and is designed to keep the CPU form
> > > over-temping. It works for this, but not power management. I'd love to
> > > see it off (for power management) by default.
> > > hint.p4tcc.0.disabled=1
> > > hint.acpi_throttle.0.disabled=1
> > 
> > What is the difference between the hints and disabling powerd?
> 
> The hints simply disable throttling and TCC for power management.
> 
> These are ALMOST identical techniques for controlling high CPU
> temperature. They were never intended to be used for power
> management. Both work by skipping N of 8 CPU cycles. When a system using
> ACPI exceeds the value of hw.acpi.thermal.tz0._PSV, it will engage
> TCC. Older systems used throttling under software control for the same
> purpose, but FreeBSD did not implement it, as far as I know.
> 
> SpeedStep and its relatives on both Intel and AMD chips is designed for
> power management and those are all I use on my systems. These are the
> relevant sysctls:
> dev.cpu.0.freq_levels: 2000/27000 1600/22600 1333/19666 1066/16733 800/13800
> dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185
> 
> I only have 5 "frequency" settings, but all work by actually slowing the
> clock and reducing voltage, so they really save power. I also have 4 'C'
> states which also can be a huge win as they allow the system to use far
> less power when idle. Different systems have more or fewer available
> states. C2 saves fairly little power. C3 (if available) is a big winner
> and C4 and above are even better, but read mav's article for a better
> description. 
> 
> Now the bad news. As you note, you have only C1. At this time the
> available frequencies are all from TCC, not SpeedStep. I thought all C2
> chips supported EST. It should be listed in the CPU features2 at the
> start of /var/run/dmesg.boot.
> 
> You should also have:
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> est1: <Enhanced SpeedStep Frequency Control> on cpu1
> est2: <Enhanced SpeedStep Frequency Control> on cpu2
> est3: <Enhanced SpeedStep Frequency Control> on cpu3
> in the dmesg, but I suspect that, for some reason, you don't, and I
> don't know why.

Well I do have them:
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Quad CPU    Q6600  _at_ 2.40GHz (2419.30-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6fb  Stepping = 11
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,
HTT,TM,PBE>
  Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 4
real memory  = 9126805504 (8704 MB)
avail memory = 8125517824 (7749 MB)
ACPI APIC Table: <INTEL  DG33FB  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
acpi0: <INTEL DG33FB> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
coretemp0: <CPU On-Die Thermal Sensors> on cpu0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
coretemp1: <CPU On-Die Thermal Sensors> on cpu1
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
coretemp2: <CPU On-Die Thermal Sensors> on cpu2
est2: <Enhanced SpeedStep Frequency Control> on cpu2
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
coretemp3: <CPU On-Die Thermal Sensors> on cpu3
est3: <Enhanced SpeedStep Frequency Control> on cpu3
p4tcc3: <CPU Frequency Thermal Control> on cpu3

How would you know that the frequencies are from TCC and not SpeedStep?

Maybe I should mention, that the system is running 7.0-stable, so it
is not running recent code.
But my server is running an almost identic board with 8.0-RC1 amd64
and has similar sysctl output:
[139]cicely14# sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.temperature: 34.0C
dev.cpu.0.freq: 2394
dev.cpu.0.freq_levels: 2394/89000 2094/77875 1795/66750 1496/55625 1197/44500 897/33375 598/22250 299/11125
dev.cpu.0.cx_supported: C1/1
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% last 500us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.CPU1
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.temperature: 32.0C
dev.cpu.1.cx_supported: C1/1
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% last 500us
dev.cpu.2.%desc: ACPI CPU
dev.cpu.2.%driver: cpu
dev.cpu.2.%location: handle=\_PR_.CPU2
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%parent: acpi0
dev.cpu.2.temperature: 30.0C
dev.cpu.2.cx_supported: C1/1
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_usage: 100.00% last 500us
dev.cpu.3.%desc: ACPI CPU
dev.cpu.3.%driver: cpu
dev.cpu.3.%location: handle=\_PR_.CPU3
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%parent: acpi0
dev.cpu.3.temperature: 30.0C
dev.cpu.3.cx_supported: C1/1
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00% last 500us

> Unfortunately, most servers and desktops are pretty poor at power
> management compared to laptops, though they are getting batter. My C2
> Quad system does have C2, though no C3, but EST does work there.

Yes - it is a desktop board and not the most modern - and not the
very best BIOS from my expirience.

-- 
B.Walter <bernd_at_bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Received on Thu Dec 24 2009 - 20:45:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC