Re: Wrong temperature with AMD and amdtemp.ko

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Mon, 5 Oct 2015 21:28:32 -0700 (PDT)
On  3 Oct, Willem Jan Withagen wrote:
> On 2-10-2015 23:32, Don Lewis wrote:
>> On  2 Oct, Willem Jan Withagen wrote:
>>>
>>> Hi
>>>
>>> 10.2-STABLE FreeBSD 10.2-STABLE #0 r287102: Mon Aug 24
>>>
>>> Processor: Opteron 6812, in Supermicro H8SGL
>>>
>>> dev.cpu.7.temperature: 11.1C
>>> dev.cpu.6.temperature: 11.1C
>>> dev.cpu.5.temperature: 11.1C
>>> dev.cpu.4.temperature: 11.1C
>>> dev.cpu.3.temperature: 11.1C
>>> dev.cpu.2.temperature: 11.1C
>>> dev.cpu.1.temperature: 11.1C
>>> dev.cpu.0.temperature: 11.1C
>>>
>>> But I'm pretty sure it is not 11.1C in the datacenter....
>>>
>>> Or should I not use amdtemp.ko for this?
>> 
>> The definition of the value that can be read from the temperature
>> register is pretty strange.  For AMD Family 15h processors, the BIOS and
>> Kernel Developer's Guide (BKDG) says this:
>> 
>>   Tctl is a processor temperature control value used for processor
>>   thermal management. Tctl is accessible through D18F3xA4[CurTmp].
>>   Tctl is a temperature on its own scale aligned to the processors
>>   cooling requirements. Therefore Tctl does not represent a temperature
>>   which could be measured on the die or the case of the processor.
>>   Instead, it specifies the processor temperature relative to the
>>   maximum operating temperature, Tctl,max. Tctl,max is specified in the
>>   power and thermal data sheet. Tctl is defined as follows for all
>>   parts:
>> 
>>   A: For Tctl = Tctl_max to 255.875: the temperature of the part is
>>   [Tctl - Tctl_max] over the maximum operat- ing temperature.  The
>>   processor may take corrective actions that affects performance, such
>>   as HTC, to support the return to Tctl range A.
>> 
>>   B: For Tctl = 0 to Tctl_max - 0.125: the temperature of the part is
>>   [Tctl_max - Tctl] under the maximum operating temperature.
>> 
>> It would be nice to report Tctl_max so that we could at least know how
>> far the temperature is from the limit, but I don't know if that is
>> available.  It might be the value in the HtcTmpLmt register, but the
>> BKDG is unclear about that.  If not, we would have to build a table of
>> values from the datasheet.
> 
> And
> 
> On 2-10-2015 23:06, Jung-uk Kim wrote:
>> On 10/02/2015 16:49, Willem Jan Withagen wrote:
> 
>> amdtemp(4):
>>
>> For Family 10h and later processors, ´(the reported temperature) is a
>> non-physical temperature measured on an arbitrary scale and it does not
>> represent an actual physical temperature like die or case temperature.
>> Instead, it specifies the processor temperature relative to the point at
>> which the system must supply the maximum cooling for the processor's
>> specified maximum case temperature and maximum thermal power dissipation¡
>> according to BIOS and Kernel Developer's Guide (BKDG) for AMD Processors,
>> http://developer.amd.com/documentation/guides/Pages/default.aspx.
> 
> If one boots into the BIOS, the BIOS suggests that it knows how to do
> this conversion.... Perhaps one can question the ultimate correctness of
> the outcome, but the 51.3C value suggests some accuracy.

That may be a measurement from a separate temperature sensor on the
motherboard underneath the CPU socket.

> Thusfar I have not been able to locate the "Power and Thermal Datasheet"
> for the family 15h....
> Perhaps need to disassemble the bios, or check other tools or OSes on
> how they do this.
> 
> --WjW
> 
Received on Tue Oct 06 2015 - 02:28:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC