Re: CURRENT + amd64 + user-ppp = panic

From: John Baldwin <jhb_at_freebsd.org>
Date: Thu, 3 Nov 2005 14:59:59 -0500
On Thursday 03 November 2005 02:28 pm, Victor Snezhko wrote:
> Max Laier <max_at_love2party.net> writes:
> >> > That brings me back to ln_timer_ch as the culprit.  Could you please
> >> > build with this patch and get me the console output shortly before the
> >> > panic?
> >>
> >> Here it is: right before panic it prints
> >>
> >> For 0xffffff0017531100 -1 ticks
> >> For 0xffffff0017531100 -1 ticks
> >
> > This results in two consecutive callout_stop() calls, but shouldn't hurt
> > as callout_stop is protected against that.
> >
> > Do you get a dump for this?  Can you compare this pointer to the global
> > llinfo_nd6 and see if "c" from the softclock() frame is related (or
> > maybe the previous item in the list TAILQ).
>
> I have finally analysed this.
>
> Applied the following patch with more verbose printf against 1.62:
>
> --- nd6-1.62.c	Fri Nov  4 01:14:50 2005
> +++ nd6.c	Fri Nov  4 01:02:30 2005
> _at__at_ -1,4 +1,4 _at__at_
> -/*	$FreeBSD: /repoman/r/ncvs/src/sys/netinet6/nd6.c,v 1.62 2005/10/22
> 05:07:16 suz Exp $	*/ +/*	$FreeBSD: src/sys/netinet6/nd6.c,v 1.62
> 2005/10/22 05:07:16 suz Exp $	*/ /*	$KAME: nd6.c,v 1.144 2001/05/24
> 07:44:00 itojun Exp $	*/
>
>  /*-
> _at__at_ -49,6 +49,7 _at__at_
>  #include <sys/syslog.h>
>  #include <sys/queue.h>
>  #include <sys/sysctl.h>
> +#include <sys/kdb.h>
>
>  #include <net/if.h>
>  #include <net/if_arc.h>
> _at__at_ -395,6 +396,9 _at__at_
>  	struct llinfo_nd6 *ln;
>  	long tick;
>  {
> +	printf("For %p %ld (llinfo_nd6=%p, prev=%p, next=%p)\n", \
> +	       ln, tick, &llinfo_nd6, llinfo_nd6.ln_prev, llinfo_nd6.ln_next);
> +	kdb_enter("in nd6_llinfo_settimer");
>  	if (tick < 0) {
>  		ln->ln_expire = 0;
>  		ln->ln_ntick = 0;
>
> Thanks to John Baldwin, kdb_enter is really helpful.
>
> Here are the results of the first call (at lo0 initialization):
> For 0xc17722c0 -1 (llinfo_nd6 = 0xc090eac0, prev=0xc17722c0,
> next=0xc17722c0)
>
> And here - the results of two consecutive calls (at ppp startup, later
> in the same session):
> For 0xc188e840 -1 (llinfo_nd6 = 0xc090eac0, prev=0xc17722c0,
> next=0xc188e840) For 0xc188e840 -1 (llinfo_nd6 = 0xc090eac0,
> prev=0xc17722c0, next=0xc17722c0)
>
> As we can see, ln formal parameter is certainly related to the global
> llinfo_nd6, but (see crashdump below) I can't find any relation to the
> "c" from softclock(), pointers there are distinct from any of
> llinfo_nd6 members I have seen.
>
> I can obtain crashdumps for both consecutive calls if somebody thinks
> there is something broken here, but it seems unrelated to the trap in
> question.
>
> What else can I do?
>
> Here is brief softclock() crashdump brief analysis:
>
> ~ # kgdb /usr/obj/usr/src/sys/VVS/kernel.debug /var/crash/vmcore.7
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so:
> Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are welcome to change it and/or distribute copies of it under certain
> conditions. Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd".
>
> Unread portion of the kernel message buffer:
> kernel trap 12 with interrupts disabled
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address	= 0xdeadc0e6
> fault code		= supervisor read, page not present
> instruction pointer	= 0x20:0xc066e0c2
> stack pointer	        = 0x28:0xcbfd3cac
> frame pointer	        = 0x28:0xcbfd3cd4
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, def32 1, gran 1
> processor eflags	= resume, IOPL = 0
> current process		= 12 (swi4: clock sio)
> panic: from debugger
> cpuid = 0
> Uptime: 45s
> Dumping 255 MB (2 chunks)
>   chunk 0: 1MB (159 pages) ... ok
>   chunk 1: 255MB (65264 pages) 239 223 207 191 175 159 143 127 111 95 79 63
> 47 31 15
>
> #0  doadump () at pcpu.h:165
> 165	pcpu.h: No such file or directory.
> 	in pcpu.h
> (kgdb) bt
> #0  doadump () at pcpu.h:165
> #1  0xc06613fc in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xc0661711 in panic (fmt=0xc08594a0 "from debugger")
>     at /usr/src/sys/kern/kern_shutdown.c:555
> #3  0xc046d151 in db_panic (addr=-1066999614, have_addr=0, count=-1,
>     modif=0xcbfd3aa8 "") at /usr/src/sys/ddb/db_command.c:435
> #4  0xc046d0e8 in db_command (last_cmdp=0xc094a1a4, cmd_table=0x0,
>     aux_cmd_tablep=0xc08c009c, aux_cmd_tablep_end=0xc08c00b8)
>     at /usr/src/sys/ddb/db_command.c:404
> #5  0xc046d1b0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
> #6  0xc046edc9 in db_trap (type=12, code=0) at
> /usr/src/sys/ddb/db_main.c:221 #7  0xc067b408 in kdb_trap (type=12, code=0,
> tf=0xcbfd3c6c)
>     at /usr/src/sys/kern/subr_kdb.c:485
> #8  0xc0823f88 in trap_fatal (frame=0xcbfd3c6c, eva=3735929062)
>     at /usr/src/sys/i386/i386/trap.c:846
> #9  0xc0823612 in trap (frame=
>       {tf_fs = 8, tf_es = -872611800, tf_ds = -1067122648, tf_edi =
> -1049930560, tf_esi = -1051228800, tf_ebp = -872596268, tf_isp =
> -872596328, tf_ebx = 1, tf_edx = -966781544, tf_ecx = -559038242, tf_eax =
> 43761, tf_trapno = 12, tf_err = 0, tf_eip = -1066999614, tf_cs = 32,
> tf_eflags = 589826, tf_esp = -1051359088, tf_ss = 0}) at
> /usr/src/sys/i386/i386/trap.c:269 #10 0xc0810e4a in calltrap () at
> /usr/src/sys/i386/i386/exception.s:139 #11 0xc066e0c2 in softclock
> (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:220 #12 0xc064ed32 in
> ithread_execute_handlers (p=0xc1558890, ie=0xc1578580) at
> /usr/src/sys/kern/kern_intr.c:662
> #13 0xc064ee63 in ithread_loop (arg=0xc151f8e0)
>     at /usr/src/sys/kern/kern_intr.c:745
> #14 0xc064dfc4 in fork_exit (callout=0xc064edfc <ithread_loop>,
>     arg=0xc151f8e0, frame=0xcbfd3d38) at /usr/src/sys/kern/kern_fork.c:789
> #15 0xc0810eac in fork_trampoline () at
> /usr/src/sys/i386/i386/exception.s:208 (kgdb) up 11
> #11 0xc066e0c2 in softclock (dummy=0x0) at
> /usr/src/sys/kern/kern_timeout.c:220 220				if (c->c_time != curticks) {
> (kgdb) list
> 215			curticks = softticks;
> 216			bucket = &callwheel[curticks & callwheelmask];
> 217			c = TAILQ_FIRST(bucket);
> 218			while (c) {
> 219				depth++;
> 220				if (c->c_time != curticks) {
> 221					c = TAILQ_NEXT(c, c_links.tqe);
> 222					++steps;
> 223					if (steps >= MAX_SOFTCLOCK_STEPS) {
> 224						nextsoftcheck = c;
> (kgdb) print *bucket
> $1 = {tqh_first = 0xc1891d80, tqh_last = 0xc1891d80}
> (kgdb) print c
> $2 = (struct callout *) 0xdeadc0de
> (kgdb) print *(bucket->tqh_first)
> $3 = {c_links = {sle = {sle_next = 0xdeadc0de}, tqe = {tqe_next =
> 0xdeadc0de, tqe_prev = 0xdeadc0de}}, c_time = -559038242, c_arg =
> 0xdeadc0de, c_func = 0xdeadc0de, c_mtx = 0xdeadc0de, c_flags = -559038242}
> (kgdb) print steps
> $4 = 1

Well, from thus it seems that a callout was free'd while it was still on the 
list.  Perhaps there is a case wehre callout_stop() isn't called.  Also, 
callout_drain() should really be used.  If the callout function is rearming, 
then it might have been running when callout_stop() is called, and it could 
have rearmed itself and then been overwritten when it was freed.  In fact, 
that is likely your problem.  You can try this patch, but there might be lock 
order problems that would require the callout_drain() to happen later when 
locks aren't held:

Index: nd6.c
===================================================================
RCS file: /usr/cvs/src/sys/netinet6/nd6.c,v
retrieving revision 1.62
diff -u -r1.62 nd6.c
--- nd6.c       22 Oct 2005 05:07:16 -0000      1.62
+++ nd6.c       3 Nov 2005 19:56:42 -0000
_at__at_ -398,7 +398,7 _at__at_
        if (tick < 0) {
                ln->ln_expire = 0;
                ln->ln_ntick = 0;
-               callout_stop(&ln->ln_timer_ch);
+               callout_drain(&ln->ln_timer_ch);
        } else {
                ln->ln_expire = time_second + tick / hz;
                if (tick > INT_MAX) {

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
Received on Thu Nov 03 2005 - 19:00:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC