Re: LOR: sched lock vs. sio + panic in sched_choose() [ULE + SMP panic]

From: David P. Reese Jr. <daver_at_gomerbud.com> Date: Fri, 6 Jun 2003 14:03:18 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:10 UTC

On Fri, Jun 06, 2003 at 12:39:46PM -0400, John Baldwin wrote:
> 
> On 06-Jun-2003 David P. Reese Jr. wrote:
> > I've been getting a lot of these for the last two weeks on my SMP box.
> > This panic is on  -CURRENT from earlier today.  Scheduler is ULE.
> > 
> > lock order reversal
> >  1st 0xc047f820 sched lock (sched lock) _at_ /usr/src/sys/kern/kern_intr.c:548
> >  2nd 0xc04b83c0 sio (sio) _at_ /usr/src/sys/dev/sio/sio.c:3242
> 
> This is a duplicate panic because you are using a serial console.
> 
> > Stack backtrace:
> > backtrace(c0400378,c04b83c0,c0463120,c0463120,c041266b) at backtrace+0x17
> > witness_lock(c04b83c0,8,c041266b,caa,c11efc00) at witness_lock+0x697
> > _mtx_lock_spin_flags(c04b83c0,0,c041266b,caa,0) at _mtx_lock_spin_flags+0xd1
> > siocnputc(c0463280,d,5,d1d62b68,0) at siocnputc+0x81
> > cnputc(a,ffffffff,1,c0415c53,c) at cnputc+0x56
> > putchar(a,d1d62b68,d1d62ab4,c0491d40,0) at putchar+0xcd
> > kvprintf(c0415c52,c025eba0,d1d62b68,a,d1d62b88) at kvprintf+0x7d
> > printf(c0415c52,c,c0415a4d,c03fe55b,c0489b20) at printf+0x57
> 
> This is the real panic below:
> 
> > trap_fatal(d1d62c14,38,d1d62bf0,c0236c9d,38) at trap_fatal+0x76
> > trap(d1d60018,c0240010,c0470010,c11dcbe0,c0482280) at trap+0x123
> > calltrap() at calltrap+0x5
> > --- trap 0xc, eip = 0xc0253ec7, esp = 0xd1d62c54, ebp = 0xd1d62c68 ---
> > sched_choose(c11dee40,c03fe7a6,28c,0,c11db668) at sched_choose+0x77
> > choosethread(c11dcbe0,2,c03fdb89,1dc,b6e81bd0) at choosethread+0x36
> > mi_switch(c047f820,0,c03fb1fd,224,c11db5ac) at mi_switch+0x200
> > ithread_loop(c11da180,d1d62d48,c03fb0ae,30c,55ff44fd) at ithread_loop+0x256
> > fork_exit(c022caf0,c11da180,d1d62d48) at fork_exit+0xc0
> > fork_trampoline() at fork_trampoline+0x1a
> > --- trap 0x1, eip = 0, esp = 0xd1d62d7c, ebp = 0 ---
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 1; lapic.id = 01000000
> > fault virtual address   = 0x38
> > fault code              = supervisor read, page not present
> > instruction pointer     = 0x8:0xc0253ec7
> > stack pointer           = 0x10:0xd1d62c54
> > frame pointer           = 0x10:0xd1d62c68
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, def32 1, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 14 (swi7: tty:sio clock)
> > kernel: type 12 trap, code=0
> > Stopped at      sched_choose+0x77:      movl    0x38(%eax),%eax
> 
> This is a ULE and SMP panic that Jeff keeps looking for.  Seems to be a
> NULL pointer deference of some sort.
> 
> > I recall most if not all of these panics occuring when swi7: tty:sio clock
> > is the current process.  These are not completely repeatable, but if I
> > simply reboot a couple of times, I can get the panic to occur while the
> > rc scripts are being run.
> 
> Can you do a 'l *sched_choose+0x77' in gdb on kernel.debug to get
> the source line corresponding to this panic?

(kgdb) l *sched_choose+0x77
0xc0253ec7 is in sched_choose (/usr/src/sys/kern/sched_ule.c:1042).
1037                     * Remove this kse from this kseq and runq and then requeue
1038                     * on the current processor.  Then we will dequeue it
1039                     * normally above.
1040                     */
1041                    ke = kseq_choose(kseq);
1042                    runq_remove(ke->ke_runq, ke);
1043                    ke->ke_state = KES_THREAD;
1044                    kseq_rem(kseq, ke);
1045
1046                    ke->ke_cpu = PCPU_GET(cpuid);

I'm currently trying to get a core, but with my latest kernel ddb is
locking up before i get a prompt.  I'll keep trying.

-- 

   David P. Reese Jr.                                      daver_at_gomerbud.com
   --------------------------------------------------------------------------
   It can be argued that returning a NULL pointer when asked to allocate
   zero bytes is a silly response to a silly question.
                                         -- FreeBSD manual page for malloc(3)