Re: Wrong temperature with AMD and amdtemp.ko

From: Willem Jan Withagen <wjw_at_digiware.nl>
Date: Sat, 3 Oct 2015 12:42:09 +0200
On 2-10-2015 23:32, Don Lewis wrote:
> On  2 Oct, Willem Jan Withagen wrote:
>>
>> Hi
>>
>> 10.2-STABLE FreeBSD 10.2-STABLE #0 r287102: Mon Aug 24
>>
>> Processor: Opteron 6812, in Supermicro H8SGL
>>
>> dev.cpu.7.temperature: 11.1C
>> dev.cpu.6.temperature: 11.1C
>> dev.cpu.5.temperature: 11.1C
>> dev.cpu.4.temperature: 11.1C
>> dev.cpu.3.temperature: 11.1C
>> dev.cpu.2.temperature: 11.1C
>> dev.cpu.1.temperature: 11.1C
>> dev.cpu.0.temperature: 11.1C
>>
>> But I'm pretty sure it is not 11.1C in the datacenter....
>>
>> Or should I not use amdtemp.ko for this?
> 
> The definition of the value that can be read from the temperature
> register is pretty strange.  For AMD Family 15h processors, the BIOS and
> Kernel Developer's Guide (BKDG) says this:
> 
>   Tctl is a processor temperature control value used for processor
>   thermal management. Tctl is accessible through D18F3xA4[CurTmp].
>   Tctl is a temperature on its own scale aligned to the processors
>   cooling requirements. Therefore Tctl does not represent a temperature
>   which could be measured on the die or the case of the processor.
>   Instead, it specifies the processor temperature relative to the
>   maximum operating temperature, Tctl,max. Tctl,max is specified in the
>   power and thermal data sheet. Tctl is defined as follows for all
>   parts:
> 
>   A: For Tctl = Tctl_max to 255.875: the temperature of the part is
>   [Tctl - Tctl_max] over the maximum operat- ing temperature.  The
>   processor may take corrective actions that affects performance, such
>   as HTC, to support the return to Tctl range A.
> 
>   B: For Tctl = 0 to Tctl_max - 0.125: the temperature of the part is
>   [Tctl_max - Tctl] under the maximum operating temperature.
> 
> It would be nice to report Tctl_max so that we could at least know how
> far the temperature is from the limit, but I don't know if that is
> available.  It might be the value in the HtcTmpLmt register, but the
> BKDG is unclear about that.  If not, we would have to build a table of
> values from the datasheet.

And

On 2-10-2015 23:06, Jung-uk Kim wrote:
> On 10/02/2015 16:49, Willem Jan Withagen wrote:

> amdtemp(4):
>
> For Family 10h and later processors, “(the reported temperature) is a
> non-physical temperature measured on an arbitrary scale and it does not
> represent an actual physical temperature like die or case temperature.
> Instead, it specifies the processor temperature relative to the point at
> which the system must supply the maximum cooling for the processor's
> specified maximum case temperature and maximum thermal power dissipation”
> according to BIOS and Kernel Developer's Guide (BKDG) for AMD Processors,
> http://developer.amd.com/documentation/guides/Pages/default.aspx.

If one boots into the BIOS, the BIOS suggests that it knows how to do
this conversion.... Perhaps one can question the ultimate correctness of
the outcome, but the 51.3C value suggests some accuracy.

Thusfar I have not been able to locate the "Power and Thermal Datasheet"
for the family 15h....
Perhaps need to disassemble the bios, or check other tools or OSes on
how they do this.

--WjW
Received on Sat Oct 03 2015 - 08:42:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC