Re: panic: double fault with 11.0-CURRENT r258504

From: Don Lewis <spamvictim_at_catspoiler.org>
Date: Sun, 1 Dec 2013 17:24:02 -0800 (PST)
On 30 Nov, To: kostikbel_at_gmail.com wrote:
> On 30 Nov, Konstantin Belousov wrote:
>> On Sat, Nov 30, 2013 at 01:02:16PM +0100, Peter Holm wrote:
>>> On Thu, Nov 28, 2013 at 09:56:10AM +0200, Konstantin Belousov wrote:
>>> > Peter, could you, please, try to reproduce the issue ?  It does not look
>>> > like a random hardware failure, since in all cases, it is curthread access
>>> > which is faulting.  The issue is only reported by Don, and so far only
>>> > for i386 SMP.
>>> 
>>> I'm not seeing this issue on my AMD Phenom(tm) 9150e Quad-Core
>>> Processor with i386/r258703.
>> 
>> Thank you.
>> 
>> 9150 is family 0x10, which my indeed point out to some errata
>> for family 0xf.  Lets wait for Don.
> 
> It's really looking like a hardware problem at this point.  I've seen no
> problems so far in about 2 1/2 passes through portupgrade -fr
> lang/perl5.16 on my other machine with the same motherboard model but a
> slightly different CPU.
> 
> CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (2200.05-MHz 686-class CPU
> )
>   Origin = "AuthenticAMD"  Id = 0x40fb2  Family = 0xf  Model = 0x4b  Stepping
> = 2
>   Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA
> ,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x2001<SSE3,CX16>
>   AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
>   AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
> 
> It's also a family 0xF CPU, but strangely different.  It only seems to
> have half as many on-die temperature sensors.
> 
> dev.amdtemp.0.sensor_offset: 0
> dev.amdtemp.0.core0.sensor0: 35.0C
> dev.amdtemp.0.core0.sensor1: -49.0C
> dev.amdtemp.0.core1.sensor0: 34.0C
> dev.amdtemp.0.core1.sensor1: -49.0C
> 
> I've never noticed this before because this is the first time FreeBSD
> has been run on this hardware.
> 
> I may have to dig out the fine manual to see if amdtemp can be tweaked
> to recognize this variation.

The fine manual says this CPU is rev BH-F2, which should have two
sensors per core, so it looks like this particular CPU might just be
slightly broken.

> After the current test run, which should finish late tonight, I'll go
> back to the original machine and try the patch.  If I still see
> failures, then I'll start swapping parts to find the bad one.

Back on the original machine, with your patch applied, it croaked with
another double fault after about five hours of port building.

Stack trace: <http://people.freebsd.org/~truckman/doublefault5.JPG>

Time to start swapping parts ...
Received on Mon Dec 02 2013 - 00:24:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC