Re: LOR route vr0

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Fri, 2 Sep 2005 14:37:31 -0400
On Thursday 01 September 2005 08:39 pm, Don Lewis wrote:
> On  1 Sep, John Baldwin wrote:
> > On Thursday 01 September 2005 01:22 pm, Don Lewis wrote:
> >> On  1 Sep, Fredrik Lindberg wrote:
> >> > I'm seeing both the rentry and the tcpinp LORs on my fxp interface
> >> > on a machine running a few days old -current (Aug 25).
> >> >
> >> > lock order reversal
> >> > 1st 0xc1e30d38 inp (tcpinp) _at_ /usr/src/sys/netinet/tcp_input.c:742
> >> > 2nd 0xc1b74018 fxp0 (network driver)
> >> > _at_/usr/src/sys/dev/fxp/if_fxp.c:1172
> >> >
> >> > lock order reversal
> >> > 1st 0xc1e06bb8 rtentry (rtentry) _at_ /usr/src/sys/net/route.c:1269
> >> > 2nd 0xc1b74018 fxp0 (network driver)
> >> > _at_/usr/src/sys/dev/fxp/if_fxp.c:1172
> >> >
> >> > As for their backtraces they are almost identical to the
> >> > once already posted.
> >>
> >> Are you using any applications that use multicast?  Can you break into
> >> DDB and capture the output of "show witness"?
> >
> > Also, are you using DEVICE_POLLING?
>
> I can reproduce this if I add DEVICE_POLLING to my kernel.  And I see
> Giant under "network driver" in the output of "show witness".
>
> If I apply your witness patch:
>   http://www.FreeBSD.org/~jhb/patches/witness.patch
> then I get the following LOR:
>
> lock order reversal
>  1st 0xc23e2018 fxp0 (network driver) _at_ /usr/src/sys/dev/fxp/if_fxp.c:1907
>  2nd 0xc09387e0 Giant (Giant) _at_ /usr/src/sys/kern/kern_poll.c:460
> KDB: stack backtrace:
> kdb_backtrace(0,ffffffff,c0946470,c0947f28,c08d3a84) at kdb_backtrace+0x29
> witness_checkorder(c09387e0,9,c086d0d3,1cc) at witness_checkorder+0x53c
> _mtx_lock_flags(c09387e0,0,c086d0d3,1cc) at _mtx_lock_flags+0x5b
> ether_poll_deregister(c23de000,c23e2000,c23e2018,0,e9295b60) at
> ether_poll_deregister+0x1d fxp_stop(c23e2000,c23e2018,1,c084c9ff,787) at
> fxp_stop+0x21
> fxp_init_body(c23e2000,c23e2018,0,c084c9ff,773) at fxp_init_body+0x31
> fxp_init(c23e2000,8020690c,c23e2000,c264bb00,e9295bc0) at fxp_init+0x23
> ether_ioctl(c23de000,8020690c,c264bb00,0,c264bb00) at ether_ioctl+0x50
> fxp_ioctl(c23de000,8020690c,c264bb00,1,c0a86503) at fxp_ioctl+0x232
> in_ifinit(c23de000,c264bb00,c24b3490,0,e9295c38) at in_ifinit+0x206
> in_control(c270fde8,8040691a,c24b3480,c23de000,c248e900) at
> in_control+0x882 ifioctl(c270fde8,8040691a,c24b3480,c248e900,0) at
> ifioctl+0x198
> soo_ioctl(c2647dc8,8040691a,c24b3480,c2271d00,c248e900) at soo_ioctl+0x2db
> ioctl(c248e900,e9295d04,3,1,286) at ioctl+0x370
> syscall(3b,3b,3b,8056e40,8059140) at syscall+0x22f
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x48136e4b, esp = 0xbfbfe5ec,
> ebp = 0xbfbfee38 --- fxp0: link state changed to UP

Yeah, because of this bug, DEVICE_POLLING really needs debug.mpsafenet=0.  
Perhaps someone should add NET_NEEDS_GIANT(polling) to sys/kern/kern_poll.c 
for now?  The problem is that the polling code needs to use something other 
than Giant to protect its internal data that it accesses in 
ether_poll_deregister() since all the drivers I've seen call 
ether_poll_deregister() with the driver lock held.

> I also get another LOR:
>
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> lock order reversal
>  1st 0xe35e0cc4 g_xdown (g_xdown) _at_ /usr/src/sys/geom/geom_io.c:465
>  2nd 0xc09387e0 Giant (Giant) _at_ /usr/src/sys/geom/geom_disk.c:99
> KDB: stack backtrace:
> kdb_backtrace(0,ffffffff,c0945e30,c0947f28,c08d3a84) at kdb_backtrace+0x29
> witness_checkorder(c09387e0,9,c0866bc0,63) at witness_checkorder+0x53c
> _mtx_lock_flags(c09387e0,0,c0866bc0,63) at _mtx_lock_flags+0x5b
> g_disk_start(c2632a50,e35e0cc4,0,c086722e,1d1) at g_disk_start+0x152
> g_io_schedule_down(c2275480) at g_io_schedule_down+0x160
> g_down_procbody(0,e35e0d38,0,c0606960,0) at g_down_procbody+0x5a
> fork_exit(c0606960,0,e35e0d38) at fork_exit+0xa0
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xe35e0d6c, ebp = 0 ---
> Trying to mount root from ufs:/dev/da0s1a

Hummmm.  That means if anyone does a msleep(g_xdown) while holding Giant then 
it could deadlock on resume since msleep() always acquires Giant first.  
Perhaps g_xdown should be an sx lock or some such.

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
Received on Fri Sep 02 2005 - 16:51:12 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:42 UTC