Re: Locking fixes for sf(4)

From: John Baldwin <jhb_at_FreeBSD.org> Date: Mon, 29 Aug 2005 13:17:23 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:42 UTC

On Monday 29 August 2005 09:42 am, Markus Brueffer wrote:
> Hi John,
>
> On Friday 12 August 2005 15:16, John Baldwin wrote:
> > On Friday 12 August 2005 08:15 am, Christian Brueffer wrote:
> > > On Thu, Aug 11, 2005 at 11:24:10AM -0400, John Baldwin wrote:
> > > > On Thursday 11 August 2005 11:00 am, Christian Brueffer wrote:
> > > > > On Wed, Aug 10, 2005 at 04:58:09PM -0400, John Baldwin wrote:
> > > > > > I've fixed up the locking in sf(4) but do not have the hardware
> > > > > > to test the changes.  Can someone please test these patches?
> > > > > > Thanks.
> > > > > >
> > > > > > http://www.freebsd.org/~jhb/patches/sf_locking.patch
> > > > >
> > > > > Results in a "recursed on non-recursive mutex" panic.
> > > > > Unfortunately the dump looks busted, I'll get a good one tomorrow
> > > > > (can also test the my(4) patch then).
> > > >
> > > > Ok.  If you could just get the backtrace from ddb that would probably
> > > > be sufficient.  Thanks for testing!
> > >
> > > panic: _mtx_lock_sleep: recursed on non-recursive mutex sf0 _at_
> > > /usr/home/build/src/sys/modules/sf/..
> > > /../pci/if_sf.c:477
> > >
> > > CPUID = 1
> > > KDB: enter: panic
> > > [thread pid 220 tid 100072 ]
> > > Stopped at      kdb_enter+0x30: leave
> > > db> tr
> > > Tracing pid 220 tid 100072 td 0xc1d63480
> > > kdb_enter(c079421b,1,c0793681,d8945ab4,c1d63480) at kdb_enter+0x30
> > > panic(c0793681,c1ad6ab0,c08ed18a,1dd,1dd) at panic+0x14e
> > > _mtx_lock_sleep(c1ac3c4c,c1d63480,0,c08ed18a,1dd) at
> > > _mtx_lock_sleep+0x47 _mtx_lock_flags(c1ac3c4c,0,c08ed18a,1dd,0) at
> > > _mtx_lock_flags+0x9c
> > > sf_ifmedia_upd(c1adb800,1000,c08ed18a,4b7,c1ac3c4c) at
> > > sf_ifmedia_upd+0x3e sf_init_locked(c1ac3c4c,0,c08ed18a,4aa,c1adb800) at
> > > sf_init_locked+0x4bc sf_init(c1ac3c00,740,c07a8534,8020690c,c1ac3c00)
> > > at sf_init+0x39 ether_ioctl(c1adb800,8020690c,c1d67e00,c05a7cd1,0) at
> > > ether_ioctl+0x67 sf_ioctl(c1adb800,8020690c,c1d67e00,100,1) at
> > > sf_ioctl+0xbb
> > > in_ifinit(c1adb800,c1d67e00,c1cef3d0,0,1) at in_ifinit+0x208
> > > in_control(c1dfcde8,8040691a,c1cef3c0,c1adb800,c1d63480) at
> > > in_control+0x986 ifioctl(c1dfcde8,8040691a,c1cef3c0,c1d63480,2) at
> > > ifioctl+0x1cd
> > > soo_ioctl(c1d59d80,8040691a,c1cef3c0,c19dca80,c1d63480) at
> > > soo_ioctl+0x3ef ioctl(c1d63480,d8945d04,c,422,3) at ioctl+0x45d
> > > syscall(3b,3b,3b,80beac0,1) at syscall+0x2c0
> > > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > > --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8055473, esp =
> > > 0xbfbfe5fc, ebp = 0xbfbfee68 ---
> >
> > Ah, ok, thanks.  This is the first driver I've seen that calls its
> > ifmedia_update routine internally.  I'll fix this and update the patch.
> > Thanks!
>
> I'm getting this LOR with the latest if_sf.c in RELENG_6:
>
> lock order reversal
>  1st 0xc1b0d7cc sf0 (network driver) _at_ /usr/src/sys/pci/if_sf.c:1201
>  2nd 0xc07a49e0 Giant (Giant) _at_ /usr/src/sys/kern/kern_poll.c:460
> KDB: stack backtrace:
> kdb_backtrace(c0742772,c07a49e0,c074dd5f,c074dd5f,c073e09e) at
> kdb_backtrace+0x2e
> witness_checkorder(c07a49e0,9,c073e09e,1cc,18a) at witness_checkorder+0x6c3
> _mtx_lock_flags(c07a49e0,0,c073e09e,1cc,c1b0d780) at _mtx_lock_flags+0x8a
> ether_poll_deregister(c1af3000,0,c074def0,5c3,c1af3000) at
> ether_poll_deregister+0x2e
> sf_stop(c1b0d780,1,c074def0,4be,c1b0d780) at sf_stop+0x52
> sf_init_locked(c1b0d780,0,c074def0,4b1,c1af3000) at sf_init_locked+0x44
> sf_init(c1b0d780,c055f18d,c07ac2c0,8020690c,c1b0d780) at sf_init+0x3a
> ether_ioctl(c1af3000,8020690c,c1c19a00,c07423ea,0) at ether_ioctl+0x67
> sf_ioctl(c1af3000,8020690c,c1c19a00,c1c19a7c,1) at sf_ioctl+0x270
> in_ifinit(c1af3000,c1c19a00,c1c6aa10,0,1) at in_ifinit+0x208
> in_control(c1e98de8,8040691a,c1c6aa00,c1af3000,c1c16a80) at
> in_control+0x986 ifioctl(c1e98de8,8040691a,c1c6aa00,c1c16a80,2) at
> ifioctl+0x1bc
> soo_ioctl(c1c57168,8040691a,c1c6aa00,c19d7a80,c1c16a80) at soo_ioctl+0x3ef
> ioctl(c1c16a80,d7827d04,c,422,3) at ioctl+0x45d
> syscall(3b,3b,3b,8058aa0,0) at syscall+0x2c0
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x280d17ef, esp = 0xbfbfe99c,
> ebp = 0xbfbfe9c8 ---
>
> brueffer_at_galaxy:/usr/src/sys/pci > ident if_sf.c
> if_sf.c:
>      $FreeBSD: src/sys/pci/if_sf.c,v 1.82.2.3 2005/08/26 14:50:16 jhb Exp $
>
> This results in a stalling network connection (watchdog timeout messages on
> the console), sometimes after 5 Minutes, sometimes after several hours.

I haven't changed this.  This seems to be a problem with DEVICE_POLLING.  You 
probably need to run with debug.mpsafenet=0 with DEVICE_POLLING for now.  The 
polling code in sys/kern/kern_poll.c needs to stop using Giant before it will 
be safe to use debug.mpsafenet=1 with DEVICE_POLLING.  sys/kern/kern_poll.c 
might need to include a 'NET_NEEDS_GIANT(polling)' line for now.

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org