Re: panic: double fault with 11.0-CURRENT r258504

From: Don Lewis <truckman_at_FreeBSD.org> Date: Mon, 25 Nov 2013 00:39:01 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC

On 25 Nov, Konstantin Belousov wrote:
> On Sat, Nov 23, 2013 at 11:43:30PM -0800, Don Lewis wrote:
>> I upgraded my 11.0-CURRENT machine to r258504 to get past the uma panic
>> that I stumbled across earlier.  Now I got this when I started upgrading
>> ports:
>> 
>> Unread portion of the kernel message buffer:
>> 
>> Fatal double fault:
>> eip = 0xc0b158e0
>> esp = 0xe4f62000
>> ebp = 0xe4f62010
>> cpuid = 0; apic id = 00
>> panic: double fault
>> cpuid = 0
>> KDB: stack backtrace:
>> db_trace_self_wrapper(c113340c,2,10000000,c15a0cf0,c15a0ce8,...) at db_trace_self_wrapper+0x2d/frame 0xc15a0cb0
>> kdb_backtrace(c12f143f,0,c12f2aea,c15a0d6c,0,...) at kdb_backtrace+0x30/frame 0xc15a0d18
>> vpanic(c15a0d6c,c15a0d84,c0fc14fb,c12f2aea,0,...) at vpanic+0x11f/frame 0xc15a0d54
>> panic(c12f2aea,0,0,0,e4f62010,...) at panic+0x12/frame 0xc15a0d60
>> dblfault_handler() at dblfault_handler+0xab/frame 0xc15a0d60
>> --- trap 0x17, eip = 0xc0b158e0, esp = 0xe4f62000, ebp = 0xe4f62010 ---
>> vprintf(c12f2900,c,fffffe7f,fffffeff,bfff75ed,...) at vprintf/frame 0xe4f62010
>> trap(e4f62164) at trap+0x18a/frame 0xe4f62158
>> calltrap() at calltrap+0x6/frame 0xe4f62158
>> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f621a4, ebp = 0xe4f62270 ---
>> kvprintf(c12f2900,c0b15210,e4f62290,a,e4f6235c,...) at kvprintf+0x1cd/frame 0xe4f62270
>> vprintf(c12f2900,e4f6235c,e4f6235c) at vprintf+0x7f/frame 0xe4f6233c
>> printf(c12f2900,c,ffefdfff,ebefefff,dfdffedf,...) at printf+0x1b/frame 0xe4f62350
>> trap(e4f624a4) at trap+0x18a/frame 0xe4f62498
>> calltrap() at calltrap+0x6/frame 0xe4f62498
>> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f624e4, ebp = 0xe4f625b0 ---
>> kvprintf(c12f2900,c0b15210,e4f625d0,a,e4f6269c,...) at kvprintf+0x1cd/frame 0xe4f625b0
>> vprintf(c12f2900,e4f6269c,e4f6269c) at vprintf+0x7f/frame 0xe4f6267c
>> printf(c12f2900,c,5fd7ff5f,ba77f7fb,bfffb7ff,...) at printf+0x1b/frame 0xe4f62690
>> trap(e4f627e4) at trap+0x18a/frame 0xe4f627d8
>> calltrap() at calltrap+0x6/frame 0xe4f627d8
>> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f62824, ebp = 0xe4f628f0 ---
>> kvprintf(c12f2900,c0b15210,e4f62910,a,e4f629dc,...) at kvprintf+0x1cd/frame 0xe4f628f0
>> vprintf(c12f2900,e4f629dc,e4f629dc) at vprintf+0x7f/frame 0xe4f629bc
>> printf(c12f2900,c,0,80000000,c0,...) at printf+0x1b/frame 0xe4f629d0
>> trap(e4f62b20) at trap+0x18a/frame 0xe4f62b14
>> calltrap() at calltrap+0x6/frame 0xe4f62b14
>> --- trap 0xc, eip = 0xc0afe270, esp = 0xe4f62b60, ebp = 0xe4f62b78 ---
>> tdq_choose(c141e090,4,c113144d,917,c2425c80,...) at tdq_choose+0x60/frame 0xe4f62b78
>> sched_choose(e4f62c00,c0afc511,c141e090,14,c113144d,...) at sched_choose+0x4c/frame 0xe4f62ba4
>> choosethread(c141e090,14,c113144d,78b,c141e116,...) at choosethread+0x1f/frame 0xe4f62bac
>> sched_switch(c8f04000,0,608,1b7,ef2,...) at sched_switch+0x361/frame 0xe4f62c00
>> mi_switch(608,0,c112f4e4,d3,c,...) at mi_switch+0x1c9/frame 0xe4f62c34
>> critical_exit(0,2,c113144d,411,c141e108,...) at critical_exit+0xa4/frame 0xe4f62c50
>> sched_idletd(0,e4f62d08,c1128634,3db,0,...) at sched_idletd+0x1d6/frame 0xe4f62ccc
>> fork_exit(c0afeb00,0,e4f62d08) at fork_exit+0x7f/frame 0xe4f62cf4
>> fork_trampoline() at fork_trampoline+0x8/frame 0xe4f62cf4
>> --- trap 0, eip = 0, esp = 0xe4f62d40, ebp = 0 ---
>> KDB: enter: panic
>> 
>> (kgdb) list *tdq_choose+0x60
>> 0xc0afe270 is in tdq_choose (/usr/src/sys/kern/sched_ule.c:1334).
>> 1329		td = runq_choose(&tdq->tdq_realtime);
>> 1330		if (td != NULL)
>> 1331			return (td);
>> 1332		td = runq_choose_from(&tdq->tdq_timeshare, tdq->tdq_ridx);
>> 1333		if (td != NULL) {
>> 1334			KASSERT(td->td_priority >= PRI_MIN_BATCH,
>> 1335			    ("tdq_choose: Invalid priority on timeshare queue %d",
>> 1336			    td->td_priority));
>> 1337			return (td);
>> 1338		}
>> 
>> (kgdb) bt
>> #0  doadump (textdump=-1051128300) at pcpu.h:233
>> #1  0xc052766d in db_fncall (dummy1=-1051051648, dummy2=0, dummy3=-1051063684, 
>>     dummy4=0xc15a0a54 "") at /usr/src/sys/ddb/db_command.c:578
>> #2  0xc0527357 in db_command (cmd_table=<value optimized out>)
>>     at /usr/src/sys/ddb/db_command.c:449
>> #3  0xc0527090 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502
>> #4  0xc0529922 in db_trap (type=<value optimized out>, code=0)
>>     at /usr/src/sys/ddb/db_main.c:231
>> #5  0xc0b0ff38 in kdb_trap (type=<value optimized out>, 
>>     code=<value optimized out>, tf=<value optimized out>)
>>     at /usr/src/sys/kern/subr_kdb.c:656
>> #6  0xc0fc0c07 in trap (frame=<value optimized out>)
>>     at /usr/src/sys/i386/i386/trap.c:712
>> #7  0xc0faa0ec in calltrap () at /usr/src/sys/i386/i386/exception.s:170
>> #8  0xc0b0f7bd in kdb_enter (why=0xc112ee39 "panic", msg=<value optimized out>)
>>     at cpufunc.h:71
>> #9  0xc0ad6a93 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
>>     at /usr/src/sys/kern/kern_shutdown.c:747
>> #10 0xc0ad6ad2 in panic (fmt=0xc12f2aea "double fault")
>>     at /usr/src/sys/kern/kern_shutdown.c:683
>> #11 0xc0fc14fb in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:1072
>> #12 0x00000000 in ?? ()
> 
> It seems to be a corruption of the td and probably curthread.
> 
> Is it repeatable easily ?  If yes, you could try to manually inspect first
> elements in the (idle) runq queue of the tdq_cpu[paniced cpu].

It doesn't seem to be very repeatable.  It crashed after a few hours of
port building last time.  It's currently been building ports for 10 1/2
hours at this point.  I don't suspect a random memory error because it's
got ECC RAM.