Re: powerd and nvidia drivers not playing nicely together (Was: Re: Systems running hot?)

From: Kevin Oberman <oberman_at_es.net>
Date: Thu, 24 Dec 2009 12:22:23 -0800
> Date: Thu, 24 Dec 2009 18:48:10 +0100
> From: Bernd Walter <ticso_at_cicely7.cicely.de>
> 
> On Thu, Dec 24, 2009 at 08:24:12AM -0800, Kevin Oberman wrote:
> > > Date: Thu, 24 Dec 2009 11:46:26 +0100
> > > From: Bernd Walter <ticso_at_cicely7.cicely.de>
> > > Sender: owner-freebsd-current_at_freebsd.org
> > > 
> > > On Wed, Dec 23, 2009 at 04:44:35PM +0200, Gleb Kurtsou wrote:
> > > > On (21/12/2009 19:18), Doug Barton wrote:
> > > > > b. f. wrote:
> > > > > > On 12/21/09, Doug Barton <dougb_at_freebsd.org> wrote:
> > > > > >> b. f. wrote:
> > > > > >>>> no X! So I think to myself, what else did I change last night.... oh
> > > > > > 
> > > > > >>> acpi_perf? acpi_throttle? acpi_thermal? acpi_video?
> > > > > >> I haven't done anything special with the acpi stuff. The only thing
> > > > > >> that looks relevant from dmesg is: acpi_tz0: <Thermal Zone> on acpi0
> > > > > >>
> > > > > > 
> > > > > > Yes, but which components show up in 'sysctl -a | grep -ie acpi' ?
> > > > > 
> > > > > It's a long list, but here you go:
> > > > > http://people.freebsd.org/~dougb/acpi-grep.txt
> > > > > 
> > > > > >>> Which nvidia driver?
> > > > > >> The latest.
> > > > > > 
> > > > > > Which video card?
> > > > > 
> > > > > nvidia0: <GeForce Go 7300>
> > > > I had similar problems with GeForce 8400M. GPU temperature could get up
> > > > to 100C in X, which increased CPU temperature in its turn.  I use
> > > > powerd, and had lockups with *_cx_lowest settings. I run amd64, i386 was
> > > > just fine on the same notebook. 
> > > 
> > > It is not just nvidia.
> > > I'm using two plain old PCI Matrox G400 and whenever I start X with
> > > powerd enabled I have a full freeze within 24 hours.
> > > It doesn't seem to be a problem to start powerd once X is runnning.
> > > Maybe it is something like tuning some delay loop with reduced clock
> > > rate, which then isn't long enough with increased speed.
> > 
> > Quick question...are you using throttling/TCC? If so, either turn it off
> > or limit how low it can run the CPU. When I was running throttling on
> > systems with old Matrox and Radeon cards, they would freeze if the
> > throttling went too low.
> 
> I asume yes - not sure about all those modern fancy names.
> In other words dev.cpu.?.freq changes.
> 
> > As mav pointed out at http://wiki.freebsd.org/TuningPowerConsumption,
> > TCC does little to conserve power and was not designed for that. TCC is
> > Thermal Control Circuit and is designed to keep the CPU form
> > over-temping. It works for this, but not power management. I'd love to
> > see it off (for power management) by default.
> > hint.p4tcc.0.disabled=1
> > hint.acpi_throttle.0.disabled=1
> 
> What is the difference between the hints and disabling powerd?

The hints simply disable throttling and TCC for power management.

These are ALMOST identical techniques for controlling high CPU
temperature. They were never intended to be used for power
management. Both work by skipping N of 8 CPU cycles. When a system using
ACPI exceeds the value of hw.acpi.thermal.tz0._PSV, it will engage
TCC. Older systems used throttling under software control for the same
purpose, but FreeBSD did not implement it, as far as I know.

SpeedStep and its relatives on both Intel and AMD chips is designed for
power management and those are all I use on my systems. These are the
relevant sysctls:
dev.cpu.0.freq_levels: 2000/27000 1600/22600 1333/19666 1066/16733 800/13800
dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185

I only have 5 "frequency" settings, but all work by actually slowing the
clock and reducing voltage, so they really save power. I also have 4 'C'
states which also can be a huge win as they allow the system to use far
less power when idle. Different systems have more or fewer available
states. C2 saves fairly little power. C3 (if available) is a big winner
and C4 and above are even better, but read mav's article for a better
description. 

Now the bad news. As you note, you have only C1. At this time the
available frequencies are all from TCC, not SpeedStep. I thought all C2
chips supported EST. It should be listed in the CPU features2 at the
start of /var/run/dmesg.boot.

You should also have:
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
in the dmesg, but I suspect that, for some reason, you don't, and I
don't know why.

Unfortunately, most servers and desktops are pretty poor at power
management compared to laptops, though they are getting batter. My C2
Quad system does have C2, though no C3, but EST does work there.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman_at_es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
Received on Thu Dec 24 2009 - 19:22:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC