Re: panic: double fault with 11.0-CURRENT r258504

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Sat, 30 Nov 2013 10:48:22 -0800 (PST)
On 30 Nov, Konstantin Belousov wrote:
> On Sat, Nov 30, 2013 at 01:02:16PM +0100, Peter Holm wrote:
>> On Thu, Nov 28, 2013 at 09:56:10AM +0200, Konstantin Belousov wrote:
>> > Peter, could you, please, try to reproduce the issue ?  It does not look
>> > like a random hardware failure, since in all cases, it is curthread access
>> > which is faulting.  The issue is only reported by Don, and so far only
>> > for i386 SMP.
>> 
>> I'm not seeing this issue on my AMD Phenom(tm) 9150e Quad-Core
>> Processor with i386/r258703.
> 
> Thank you.
> 
> 9150 is family 0x10, which my indeed point out to some errata
> for family 0xf.  Lets wait for Don.

It's really looking like a hardware problem at this point.  I've seen no
problems so far in about 2 1/2 passes through portupgrade -fr
lang/perl5.16 on my other machine with the same motherboard model but a
slightly different CPU.

CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (2200.05-MHz 686-class CPU
)
  Origin = "AuthenticAMD"  Id = 0x40fb2  Family = 0xf  Model = 0x4b  Stepping
= 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA
,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>

It's also a family 0xF CPU, but strangely different.  It only seems to
have half as many on-die temperature sensors.

dev.amdtemp.0.sensor_offset: 0
dev.amdtemp.0.core0.sensor0: 35.0C
dev.amdtemp.0.core0.sensor1: -49.0C
dev.amdtemp.0.core1.sensor0: 34.0C
dev.amdtemp.0.core1.sensor1: -49.0C

I've never noticed this before because this is the first time FreeBSD
has been run on this hardware.

I may have to dig out the fine manual to see if amdtemp can be tweaked
to recognize this variation.

After the current test run, which should finish late tonight, I'll go
back to the original machine and try the patch.  If I still see
failures, then I'll start swapping parts to find the bad one.
Received on Sat Nov 30 2013 - 17:48:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC