rtentry recursion panic (fwd)

From: Andrew R. Reiter <arr_at_watson.org>
Date: Thu, 9 Dec 2004 12:53:31 -0500 (EST)
Sending since I know Sam and Bruce are busy; I am busy.. not as busy.. but
busy enough to not have time to track code paths down further than I did
in the below email.

NOTE:  The line # in the panicstr is WRONG! this message was after me
doing some minor printf() additions to see what was going on a bit further
after I was unable to get a good vmcore.

NOTE:  The way I explained the way to reproduce it is a bit more difficult
than it can be done.  Ie.. just do:

	ifconfig <device> 10.9.9.100 netmask 255.255.0.0 up
	route add default 10.9.0.1
	route delete default
	route delete 10.9/16
	route add default 10.9.0.1
	traceroute 10.9.0.1
	panic...

I explain a bit further in the below email where the actual problem is.. I
just am not sure the exact means it got to that point.  I've verified this
exists on CURRENT of yesterday.  So it's still lingering..

---------- Forwarded message ----------
Date: Wed, 1 Dec 2004 03:24:36 -0500 (EST)
From: Andrew R. Reiter <arr_at_watson.org>
To: sam_at_freebsd.org, bms_at_freebsd.org
Subject: rtentry recursion panic

Sam & Bruce,

I am admittedly running a slightly out of date CURRENT, but am hoping to
see whether or not you all have seen this (or related) panic.

The panic string is [ignore line #, added debug goo for own use]:

_mtx_lock_sleep: recursed on non-recursive mutex rtentry _at_
/usr/au_src/sys/net/route.c:1311

And the [long] backtrace is:

#0  doadump () at pcpu.h:159
#1  0xc0466dee in db_fncall (dummy1=-1065588687, dummy2=0,
dummy3=-563291948,
    dummy4=0xde6cd86c "l") at /usr/au_src/sys/ddb/db_command.c:531
#2  0xc0467184 in db_command_loop () at
/usr/au_src/sys/ddb/db_command.c:349
#3  0xc0468b8c in db_trap (type=3, code=0) at
/usr/au_src/sys/ddb/db_main.c:221
#4  0xc064e32e in kdb_trap (type=3, code=0, tf=0xde6cd994)
    at /usr/au_src/sys/kern/subr_kdb.c:421
#5  0xc07d977c in trap (frame=
      {tf_fs = -563347432, tf_es = -1067122672, tf_ds = -1065091056,
tf_edi = 0, tf_esi = 1, tf_ebp = -563291692, tf_isp = -563291712, tf_ebx =
-563291648, tf_edx = -1065058018, tf_ecx = -1064210208, tf_eax =
-1065049367, tf_trapno = 3, tf_err = 0, tf_eip = -1067131032, tf_cs = 8,
tf_eflags = 642, tf_esp = -563291660, tf_ss = -1067222809}) at
/usr/au_src/sys/i386/i386/trap.c:577
#6  0xc07c82aa in calltrap () at /usr/au_src/sys/i386/i386/exception.s:140
#7  0xde6c0018 in ?? ()
#8  0xc0650010 in kvprintf (fmt=0x1 <Address 0x1 out of bounds>,
func=0x100,
    arg=0xc1f1aaf0, radix=-1042196716, ap=0xc1e15714 "D+\213\v\204\v\204")
    at /usr/au_src/sys/kern/subr_prf.c:543
#9  0xc06378e7 in panic (fmt=0x282 <Address 0x282 out of bounds>)
    at /usr/au_src/sys/kern/kern_shutdown.c:525
#10 0xc06301b1 in _mtx_lock_sleep (m=0xc1e15714, td=0xc1f1aaf0, opts=0,
    file=0xc0852322 "/usr/au_src/sys/net/route.c", line=1311)
    at /usr/au_src/sys/kern/kern_mutex.c:492
#11 0xc063023c in _mtx_lock_flags (m=0xc1e15714, opts=0,
    file=0xc0852322 "/usr/au_src/sys/net/route.c", line=1311)
    at /usr/au_src/sys/kern/kern_mutex.c:273
#12 0xc06a87ed in rt_check (lrt=0x0, lrt0=0xde6cdab4, dst=0xc1fa7a10)
    at /usr/au_src/sys/net/route.c:1311
#13 0xc06b3481 in arpresolve (ifp=0xc1cea000, rt0=0xc1e156b4,
m=0xc1e0f300,
    dst=0xc1fa7a10, desten=0xde6cdad0 "")
    at /usr/au_src/sys/netinet/if_ether.c:351
#14 0xc069c392 in ether_output (ifp=0xc1cea000, m=0xc1e0f300,
dst=0xc1fa7a10,
    rt0=0x0) at /usr/au_src/sys/net/if_ethersubr.c:160
#15 0xc06bd94f in ip_output (m=0xc1e0f300, opt=0xc1cea000, ro=0xde6cdb40,
    flags=34, imo=0x0, inp=0xc1faeec4)
    at /usr/au_src/sys/netinet/ip_output.c:771
#16 0xc06bdf81 in rip_output (m=0xc1e0f300, so=0x0, dst=0)
    at /usr/au_src/sys/netinet/raw_ip.c:320
#17 0xc06bea8f in rip_send (so=0xc1d78798, flags=0, m=0xc1e0f300,
    nam=0xc1ba51a0, control=0x0, td=0xc1f1aaf0)
    at /usr/au_src/sys/netinet/raw_ip.c:784
#18 0xc066db10 in sosend (so=0xc1d78798, addr=0xc1ba51a0, uio=0xde6cdc54,
    top=0xc1e0f300, control=0x0, flags=0, td=0xc1f1aaf0)
    at /usr/au_src/sys/kern/uipc_socket.c:816
#19 0xc06739ea in kern_sendit (td=0xc1f1aaf0, s=7, mp=0xde6cdccc, flags=0,
    control=0x0) at /usr/au_src/sys/kern/uipc_syscalls.c:784
#20 0xc067488c in sendit (td=0xc1f1aaf0, s=7, mp=0xde6cdccc, flags=0)
    at /usr/au_src/sys/kern/uipc_syscalls.c:725
#21 0xc0674b7d in sendto (td=0x0, uap=0x0)
    at /usr/au_src/sys/kern/uipc_syscalls.c:841
#22 0xc07d9bc9 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 0, tf_esi = 0, tf_ebp
= -1077941288, tf_isp = -563290764, tf_ebx = 672388112, tf_edx =
134553600, tf_ecx = 1, tf_eax = 133, tf_trapno = 12, tf_err = 2, tf_eip =
671908087, tf_cs = 31, tf_eflags = 662, tf_esp = -1077941348, tf_ss = 47})
    at /usr/au_src/sys/i386/i386/trap.c:1003
#23 0xc07c82ff in Xint0x80_syscall ()
    at /usr/au_src/sys/i386/i386/exception.s:201
#24 0x0000002f in ?? ()
#25 0x0000002f in ?? ()
#26 0x0000002f in ?? ()
#27 0x00000000 in ?? ()
#28 0x00000000 in ?? ()
#29 0xbfbfebd8 in ?? ()
#30 0xde6cdd74 in ?? ()
#31 0x2813d410 in ?? ()
#32 0x08052000 in ?? ()
#33 0x00000001 in ?? ()
#34 0x00000085 in ?? ()
#35 0x0000000c in ?? ()
#36 0x00000002 in ?? ()
#37 0x280c80f7 in ?? ()
#38 0x0000001f in ?? ()
#39 0x00000296 in ?? ()
#40 0xbfbfeb9c in ?? ()
#41 0x0000002f in ?? ()
#42 0x00000000 in ?? ()
#43 0x00000000 in ?? ()
#44 0x00000000 in ?? ()
#45 0x00000000 in ?? ()
#46 0x1f343000 in ?? ()
#47 0x00000000 in ?? ()
#48 0xc1f1aaf0 in ?? ()
#49 0xde6cda28 in ?? ()
#50 0xde6cda10 in ?? ()
#51 0xc19da640 in ?? ()
#52 0xc0647643 in sched_switch (td=0x0, newtd=0x2813d410, flags=Cannot
access memory at address 0xbfbfebe8
)
    at /usr/au_src/sys/kern/sched_4bsd.c:865
(kgdb)

The recursion takes place in rt_check() after rtalloc1()'ing for a
rt_gwroute since rt0's (*lrt0) rt_gwroute is NULL.  The value rtentry
object allocated is pointing at the same address as rt0... and when we
return from rtalloc1() the rtentry object is locked.... so the following
occurs in rt_check() (indent python-esque format):

rt_check(..., struct rtentry **lrt0, ...)
 rt = rt0 = *lrt0
 if (rt)
    RT_LOCK(rt)
    ...checks to see if RTF_GATEWAY; jumps down since rt_gwroute==0
    RT_UNLOCK(rt0)  // remember, rt == rt0, even now due to beginning code
    // alloc new gateway rtentry
    rt = rtalloc1(rt->rt_gateway, 1, 0);	// is rt_gateway safe??
    RT_LOCK(rt0);	// BAM! panic.. rt (after rtalloc1()) points to
same addr as rt0... hows that happening?)

The locked is grabbed in rtalloc1() as well as a refcnt is added (not
marked RTF_CLONING at alloc time).

I've narrowed down how to reproduce in a reasonable fashion... perhaps
could be narrowed down further, but am starting to get sleepy tonight..
can help tomorrow.  This UP laptop has a bge ethernet nic and a wi
wireless nic.  Upon boot, my startup scripts are configured to ifconfig
bge to a 10.x.x.x ipaddress and set the default route (nothing exciting).
The wi device is left untouched.  So, to cause the panic...

  # ifconfig wi0 192.168.0.3 netmask 255.255.255.0 ssid BLAH stationname j
  # route delete default	// Whacks 10.x.x.x default route
  # route delete 10.5/16
  # route delete 192.168.0.1	// Shouldnt be there unless pingd or other
  # route delete -net 192.168.0
  # route add default 192.168.0.1
  # traceroute 192.168.0.1

The traceroute command calls sendto().... and so on (as seen above).

As you can tell, I've not gone through the code paths in the above steps
to try and determine the root of the problem; my apologies.  I plan on
picking this up tomorrow when I have time at work and going from there.
Please let me know if you have any thoughts or can help and would like
more information.

Cheers,
Andrew

--
Andrew R. Reiter
arr_at_watson.org
arr_at_FreeBSD.org
Received on Thu Dec 09 2004 - 16:56:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:24 UTC