Re: CURRENT + amd64 + user-ppp = panic

From: Victor Snezhko <snezhko_at_indorsoft.ru>
Date: Fri, 04 Nov 2005 01:28:39 +0600
Max Laier <max_at_love2party.net> writes:

>> > That brings me back to ln_timer_ch as the culprit.  Could you please
>> > build with this patch and get me the console output shortly before the
>> > panic?
>>
>> Here it is: right before panic it prints
>>
>> For 0xffffff0017531100 -1 ticks
>> For 0xffffff0017531100 -1 ticks
>
> This results in two consecutive callout_stop() calls, but shouldn't hurt as 
> callout_stop is protected against that.
>
> Do you get a dump for this?  Can you compare this pointer to the global 
> llinfo_nd6 and see if "c" from the softclock() frame is related (or
> maybe the previous item in the list TAILQ).

I have finally analysed this.

Applied the following patch with more verbose printf against 1.62:

--- nd6-1.62.c	Fri Nov  4 01:14:50 2005
+++ nd6.c	Fri Nov  4 01:02:30 2005
_at__at_ -1,4 +1,4 _at__at_
-/*	$FreeBSD: /repoman/r/ncvs/src/sys/netinet6/nd6.c,v 1.62 2005/10/22 05:07:16 suz Exp $	*/
+/*	$FreeBSD: src/sys/netinet6/nd6.c,v 1.62 2005/10/22 05:07:16 suz Exp $	*/
 /*	$KAME: nd6.c,v 1.144 2001/05/24 07:44:00 itojun Exp $	*/
 
 /*-
_at__at_ -49,6 +49,7 _at__at_
 #include <sys/syslog.h>
 #include <sys/queue.h>
 #include <sys/sysctl.h>
+#include <sys/kdb.h>
 
 #include <net/if.h>
 #include <net/if_arc.h>
_at__at_ -395,6 +396,9 _at__at_
 	struct llinfo_nd6 *ln;
 	long tick;
 {
+	printf("For %p %ld (llinfo_nd6=%p, prev=%p, next=%p)\n", \
+	       ln, tick, &llinfo_nd6, llinfo_nd6.ln_prev, llinfo_nd6.ln_next);
+	kdb_enter("in nd6_llinfo_settimer");
 	if (tick < 0) {
 		ln->ln_expire = 0;
 		ln->ln_ntick = 0;

Thanks to John Baldwin, kdb_enter is really helpful.

Here are the results of the first call (at lo0 initialization):
For 0xc17722c0 -1 (llinfo_nd6 = 0xc090eac0, prev=0xc17722c0, next=0xc17722c0)

And here - the results of two consecutive calls (at ppp startup, later
in the same session):
For 0xc188e840 -1 (llinfo_nd6 = 0xc090eac0, prev=0xc17722c0, next=0xc188e840)
For 0xc188e840 -1 (llinfo_nd6 = 0xc090eac0, prev=0xc17722c0, next=0xc17722c0)

As we can see, ln formal parameter is certainly related to the global
llinfo_nd6, but (see crashdump below) I can't find any relation to the
"c" from softclock(), pointers there are distinct from any of
llinfo_nd6 members I have seen.

I can obtain crashdumps for both consecutive calls if somebody thinks
there is something broken here, but it seems unrelated to the trap in
question.

What else can I do?

Here is brief softclock() crashdump brief analysis:

~ # kgdb /usr/obj/usr/src/sys/VVS/kernel.debug /var/crash/vmcore.7 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xdeadc0e6
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc066e0c2
stack pointer	        = 0x28:0xcbfd3cac
frame pointer	        = 0x28:0xcbfd3cd4
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= resume, IOPL = 0
current process		= 12 (swi4: clock sio)
panic: from debugger
cpuid = 0
Uptime: 45s
Dumping 255 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 255MB (65264 pages) 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc06613fc in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0661711 in panic (fmt=0xc08594a0 "from debugger")
    at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc046d151 in db_panic (addr=-1066999614, have_addr=0, count=-1, 
    modif=0xcbfd3aa8 "") at /usr/src/sys/ddb/db_command.c:435
#4  0xc046d0e8 in db_command (last_cmdp=0xc094a1a4, cmd_table=0x0, 
    aux_cmd_tablep=0xc08c009c, aux_cmd_tablep_end=0xc08c00b8)
    at /usr/src/sys/ddb/db_command.c:404
#5  0xc046d1b0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:455
#6  0xc046edc9 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:221
#7  0xc067b408 in kdb_trap (type=12, code=0, tf=0xcbfd3c6c)
    at /usr/src/sys/kern/subr_kdb.c:485
#8  0xc0823f88 in trap_fatal (frame=0xcbfd3c6c, eva=3735929062)
    at /usr/src/sys/i386/i386/trap.c:846
#9  0xc0823612 in trap (frame=
      {tf_fs = 8, tf_es = -872611800, tf_ds = -1067122648, tf_edi = -1049930560, tf_esi = -1051228800, tf_ebp = -872596268, tf_isp = -872596328, tf_ebx = 1, tf_edx = -966781544, tf_ecx = -559038242, tf_eax = 43761, tf_trapno = 12, tf_err = 0, tf_eip = -1066999614, tf_cs = 32, tf_eflags = 589826, tf_esp = -1051359088, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:269
#10 0xc0810e4a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#11 0xc066e0c2 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:220
#12 0xc064ed32 in ithread_execute_handlers (p=0xc1558890, ie=0xc1578580)
    at /usr/src/sys/kern/kern_intr.c:662
#13 0xc064ee63 in ithread_loop (arg=0xc151f8e0)
    at /usr/src/sys/kern/kern_intr.c:745
#14 0xc064dfc4 in fork_exit (callout=0xc064edfc <ithread_loop>, 
    arg=0xc151f8e0, frame=0xcbfd3d38) at /usr/src/sys/kern/kern_fork.c:789
#15 0xc0810eac in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208
(kgdb) up 11
#11 0xc066e0c2 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:220
220				if (c->c_time != curticks) {
(kgdb) list
215			curticks = softticks;
216			bucket = &callwheel[curticks & callwheelmask];
217			c = TAILQ_FIRST(bucket);
218			while (c) {
219				depth++;
220				if (c->c_time != curticks) {
221					c = TAILQ_NEXT(c, c_links.tqe);
222					++steps;
223					if (steps >= MAX_SOFTCLOCK_STEPS) {
224						nextsoftcheck = c;
(kgdb) print *bucket
$1 = {tqh_first = 0xc1891d80, tqh_last = 0xc1891d80}
(kgdb) print c
$2 = (struct callout *) 0xdeadc0de
(kgdb) print *(bucket->tqh_first)
$3 = {c_links = {sle = {sle_next = 0xdeadc0de}, tqe = {tqe_next = 0xdeadc0de, 
      tqe_prev = 0xdeadc0de}}, c_time = -559038242, c_arg = 0xdeadc0de, 
  c_func = 0xdeadc0de, c_mtx = 0xdeadc0de, c_flags = -559038242}
(kgdb) print steps
$4 = 1


-- 
WBR, Victor V. Snezhko
EMail: snezhko_at_indorsoft.ru
Received on Thu Nov 03 2005 - 18:28:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC