Re: panic: double fault with 11.0-CURRENT r258504

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Mon, 25 Nov 2013 10:10:47 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC

On Sat, Nov 23, 2013 at 11:43:30PM -0800, Don Lewis wrote:
> I upgraded my 11.0-CURRENT machine to r258504 to get past the uma panic
> that I stumbled across earlier.  Now I got this when I started upgrading
> ports:
> 
> Unread portion of the kernel message buffer:
> 
> Fatal double fault:
> eip = 0xc0b158e0
> esp = 0xe4f62000
> ebp = 0xe4f62010
> cpuid = 0; apic id = 00
> panic: double fault
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper(c113340c,2,10000000,c15a0cf0,c15a0ce8,...) at db_trace_self_wrapper+0x2d/frame 0xc15a0cb0
> kdb_backtrace(c12f143f,0,c12f2aea,c15a0d6c,0,...) at kdb_backtrace+0x30/frame 0xc15a0d18
> vpanic(c15a0d6c,c15a0d84,c0fc14fb,c12f2aea,0,...) at vpanic+0x11f/frame 0xc15a0d54
> panic(c12f2aea,0,0,0,e4f62010,...) at panic+0x12/frame 0xc15a0d60
> dblfault_handler() at dblfault_handler+0xab/frame 0xc15a0d60
> --- trap 0x17, eip = 0xc0b158e0, esp = 0xe4f62000, ebp = 0xe4f62010 ---
> vprintf(c12f2900,c,fffffe7f,fffffeff,bfff75ed,...) at vprintf/frame 0xe4f62010
> trap(e4f62164) at trap+0x18a/frame 0xe4f62158
> calltrap() at calltrap+0x6/frame 0xe4f62158
> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f621a4, ebp = 0xe4f62270 ---
> kvprintf(c12f2900,c0b15210,e4f62290,a,e4f6235c,...) at kvprintf+0x1cd/frame 0xe4f62270
> vprintf(c12f2900,e4f6235c,e4f6235c) at vprintf+0x7f/frame 0xe4f6233c
> printf(c12f2900,c,ffefdfff,ebefefff,dfdffedf,...) at printf+0x1b/frame 0xe4f62350
> trap(e4f624a4) at trap+0x18a/frame 0xe4f62498
> calltrap() at calltrap+0x6/frame 0xe4f62498
> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f624e4, ebp = 0xe4f625b0 ---
> kvprintf(c12f2900,c0b15210,e4f625d0,a,e4f6269c,...) at kvprintf+0x1cd/frame 0xe4f625b0
> vprintf(c12f2900,e4f6269c,e4f6269c) at vprintf+0x7f/frame 0xe4f6267c
> printf(c12f2900,c,5fd7ff5f,ba77f7fb,bfffb7ff,...) at printf+0x1b/frame 0xe4f62690
> trap(e4f627e4) at trap+0x18a/frame 0xe4f627d8
> calltrap() at calltrap+0x6/frame 0xe4f627d8
> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f62824, ebp = 0xe4f628f0 ---
> kvprintf(c12f2900,c0b15210,e4f62910,a,e4f629dc,...) at kvprintf+0x1cd/frame 0xe4f628f0
> vprintf(c12f2900,e4f629dc,e4f629dc) at vprintf+0x7f/frame 0xe4f629bc
> printf(c12f2900,c,0,80000000,c0,...) at printf+0x1b/frame 0xe4f629d0
> trap(e4f62b20) at trap+0x18a/frame 0xe4f62b14
> calltrap() at calltrap+0x6/frame 0xe4f62b14
> --- trap 0xc, eip = 0xc0afe270, esp = 0xe4f62b60, ebp = 0xe4f62b78 ---
> tdq_choose(c141e090,4,c113144d,917,c2425c80,...) at tdq_choose+0x60/frame 0xe4f62b78
> sched_choose(e4f62c00,c0afc511,c141e090,14,c113144d,...) at sched_choose+0x4c/frame 0xe4f62ba4
> choosethread(c141e090,14,c113144d,78b,c141e116,...) at choosethread+0x1f/frame 0xe4f62bac
> sched_switch(c8f04000,0,608,1b7,ef2,...) at sched_switch+0x361/frame 0xe4f62c00
> mi_switch(608,0,c112f4e4,d3,c,...) at mi_switch+0x1c9/frame 0xe4f62c34
> critical_exit(0,2,c113144d,411,c141e108,...) at critical_exit+0xa4/frame 0xe4f62c50
> sched_idletd(0,e4f62d08,c1128634,3db,0,...) at sched_idletd+0x1d6/frame 0xe4f62ccc
> fork_exit(c0afeb00,0,e4f62d08) at fork_exit+0x7f/frame 0xe4f62cf4
> fork_trampoline() at fork_trampoline+0x8/frame 0xe4f62cf4
> --- trap 0, eip = 0, esp = 0xe4f62d40, ebp = 0 ---
> KDB: enter: panic
> 
> (kgdb) list *tdq_choose+0x60
> 0xc0afe270 is in tdq_choose (/usr/src/sys/kern/sched_ule.c:1334).
> 1329		td = runq_choose(&tdq->tdq_realtime);
> 1330		if (td != NULL)
> 1331			return (td);
> 1332		td = runq_choose_from(&tdq->tdq_timeshare, tdq->tdq_ridx);
> 1333		if (td != NULL) {
> 1334			KASSERT(td->td_priority >= PRI_MIN_BATCH,
> 1335			    ("tdq_choose: Invalid priority on timeshare queue %d",
> 1336			    td->td_priority));
> 1337			return (td);
> 1338		}
> 
> (kgdb) bt
> #0  doadump (textdump=-1051128300) at pcpu.h:233
> #1  0xc052766d in db_fncall (dummy1=-1051051648, dummy2=0, dummy3=-1051063684, 
>     dummy4=0xc15a0a54 "") at /usr/src/sys/ddb/db_command.c:578
> #2  0xc0527357 in db_command (cmd_table=<value optimized out>)
>     at /usr/src/sys/ddb/db_command.c:449
> #3  0xc0527090 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502
> #4  0xc0529922 in db_trap (type=<value optimized out>, code=0)
>     at /usr/src/sys/ddb/db_main.c:231
> #5  0xc0b0ff38 in kdb_trap (type=<value optimized out>, 
>     code=<value optimized out>, tf=<value optimized out>)
>     at /usr/src/sys/kern/subr_kdb.c:656
> #6  0xc0fc0c07 in trap (frame=<value optimized out>)
>     at /usr/src/sys/i386/i386/trap.c:712
> #7  0xc0faa0ec in calltrap () at /usr/src/sys/i386/i386/exception.s:170
> #8  0xc0b0f7bd in kdb_enter (why=0xc112ee39 "panic", msg=<value optimized out>)
>     at cpufunc.h:71
> #9  0xc0ad6a93 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
>     at /usr/src/sys/kern/kern_shutdown.c:747
> #10 0xc0ad6ad2 in panic (fmt=0xc12f2aea "double fault")
>     at /usr/src/sys/kern/kern_shutdown.c:683
> #11 0xc0fc14fb in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:1072
> #12 0x00000000 in ?? ()

It seems to be a corruption of the td and probably curthread.

Is it repeatable easily ?  If yes, you could try to manually inspect first
elements in the (idle) runq queue of the tdq_cpu[paniced cpu].