Re: Call for Testers: Convert watchdog spinlock into a sleepable mutex in if_ndis

From: Coleman Kane <cokane_at_FreeBSD.org>
Date: Mon, 09 Jun 2008 11:52:35 -0400
On Mon, 2008-06-09 at 08:03 -0400, Coleman Kane wrote:
> On Sun, 2008-06-08 at 03:55 +0200, Paul B. Mahol wrote:
> > On 6/7/08, Coleman Kane <cokane_at_freebsd.org> wrote:
> > > Hi,
> > >
> > > I've got another patch to if_ndis and I'd like to know if any NDIS users
> > > out there can test it for me and give me some feedback. I've converted
> > > the ndis_spinlock in if_ndis into a normal sleepable mutex, and would
> > > like to get some testers. The idea here is to eliminate an unnecessary
> > > "spinning" in the kernel, where it would be preferable to sleep (for CPU
> > > usage, interactivity, etc...).
> > >
> > > You can download it here:
> > >       *
> > > http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx.patch
> > >
> > > --
> > > Coleman Kane
> > >
> > 
> > I did not found any stability regressions in my environment with linked patch.
> > 
> > But I noticed   new   LOR (it appears as soon as radio of device is turned on):
> > 
> > lock order reversal:
> >  1st 0xc401726c ndis0 (network driver) _at_
> > /usr/local/src/sys/modules/if_ndis/../../dev/if_ndis/if_ndis.c:1648
> >  2nd 0xc09b3e74 HAL preemption lock (HAL lock) _at_
> > /usr/local/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:423
> > KDB: stack backtrace:
> > db_trace_self_wrapper(c0708534,c3985bcc,c0541a6e,c070adf6,c09b3e74,...)
> > at db_trace_self_wrapper+0x26
> > kdb_backtrace(c070adf6,c09b3e74,c09b0a51,c09b0a48,c09b09f8,...) at
> > kdb_backtrace+0x29
> > witness_checkorder(c09b3e74,9,c09b09f8,1a7,c04f7364,...) at
> > witness_checkorder+0x6de
> > _mtx_lock_flags(c09b3e74,0,c09b09f8,1a7,c457ebd0,...) at _mtx_lock_flags+0xbc
> > KfRaiseIrql(2,c457ebd0,c3985c48,c09aacdd,c3d45354,...) at KfRaiseIrql+0x63
> > KfAcquireSpinLock(c3d45354,c094888a,c4017200,c3d6cc40,c4017200,...) at
> > KfAcquireSpinLock+0x13
> > IoQueueWorkItem(c457ebd0,c3d6cc40,0,c4017200,c07040ef,...) at
> > IoQueueWorkItem+0x2d
> > ndis_tick(c4017200,0,c070699c,166,c077e594,...) at ndis_tick+0x143
> > softclock(c077e560,0,c0701ca6,4d4,c3cda6ec,...) at softclock+0x24a
> > ithread_loop(c3cdb2e0,c3985d38,c0701a09,324,c3c99a70,...) at ithread_loop+0x1b5
> > fork_exit(c04e74f0,c3cdb2e0,c3985d38) at fork_exit+0xb8
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0, eip = 0, esp = 0xc3985d70, ebp = 0 ---
> 
> Try this one:
>       * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx2.patch
> 
> There's the same LOR in the existing code as well, but it is hidden by
> WITNESS_SKIPSPIN.
> 
> If you can verify that this works, I'll just commit them both as a
> single update.
> 

Paul,

Ignore the previous patch (#2) and try this one instead:
      * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx3.patch

I analyzed the code a bit more and decided to push the mtx acquisition
so that it is inside the test for the TX-watchdog case. This should
result in having less traffic on that mutex under normal running
conditions. In this case, the mutex is only locked inside the
ndis_ticktask, and also inside the TX-watchdog handling code. In
general, when neither of these cases are true and the only action is a
simple re-schedule of the callout, the lock never gets acquired.

thompsa/jhb: What do you think of these last tweaks? I did some analysis
and I don't think that the ndis_hang_timer (rw), nmb_checkforhangsecs
(ro), and ndis_tx_timer (rw) are themselves subject to any races in the
current design (the rw/ro indicate their use within the unlocked
ndis_tick context).

Furthermore, as callout_drain(9) is the only method acting upon the
callout in any multithreaded context, it should be safe to perform
callout_reset(9) on the unlocked structure. I also did some more
research (just now) and it looks like ndis_start (called by
ndis_starttask) and ndis_reset_nic (in sys/compat/ndis/kern_ndis.c,
called by ndis_resettask) both perform their own locking. It is looking
to me like we may not even need to acquire NDIS_LOCK() from inside
ndis_tick() at all, but instead just use the existing locking in the
task-specific functions that it calls in wd/timeout cases.

-- 
Coleman Kane

Received on Mon Jun 09 2008 - 13:54:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:31 UTC