Re: LOR with netisr changes

From: Sam Leffler <sam_at_errno.com>
Date: Mon, 04 Dec 2006 14:50:56 -0800
Max Laier wrote:
> On Monday 04 December 2006 17:49, John Baldwin wrote:
>> On Sunday 03 December 2006 10:49, Munehiro Matsuda wrote:
>>> From: John Baldwin <jhb-at-freebsd.org>
>>> Date: Wed, 29 Nov 2006 12:39:56 -0500
>>>
>>> ::On Tuesday 28 November 2006 14:41, Sam Leffler wrote:
>>> ::> Robert Watson wrote:
>>> ::> > On Wed, 29 Nov 2006, Munehiro Matsuda wrote:
>>> ::> >> JFYI, I got following LOR started after the netisr changes:
>>> ::> >
>>> ::> > In general, device driver locks should not be held over entry
>>> ::> > to the network stack.  However, things get a bit tricky in the
>>> ::> > 802.11 code due to lock sharing, I believe, so it could be a
>>> ::> > bit more tricky to fix that.
>>> ::>
>>> ::> It's just a bug in the driver.  Driver locks should be dropped
>>> ::> when packets get passed up the stack.  This issue was pointed out
>>> ::> before iwi ever was committed but since nothing immediately comes
>>> ::> back via the bridge (or similar) it's been ignored.
>>> ::
>>> ::How about this:
>>>
>>> <snip>
>>>
>>> ::(I'm unsure if the net82011 layer protects 'ic' or if the driver is
>>> :: supposed to do that.)
> 
> That's the main problem of net80211 as far as I understand it.  The 'ic' 
> stores a lot of state and configuration and it is unclear how it is 
> protected.  Most of the time, it seems to be the driver's job.  However, 
> there are entry points in net80211 that change the 'ic' underneath the 
> driver without giving them a chance to protect it.  IIRC, there has been 
> the idea to expose a driver mutex to the net80211 layer in order to take 
> care of this - not sure what has become of that idea, though.

Hey, speak for yourself :)  The ic is mostly treated as an extension of
the softc and is intended to be covered by the driver's softc lock.
There are exceptions (e.g. the node table) but that's the main intent.
The complications come from the layering of net80211 over the driver but
because net80211 has no access to the driver's softc lock confusion
abounds and you get problems like calls into ieee80211_ioctl wanting to
release the lock to do a copyin but being unable to.

As you note I've proposed exposing the driver softc lock but quite a
long time but gotten zero feedback so ignored it.  I also find it hard
to spend a lot of time on code that's been heavily rewritten and that
has mostly different issues (i.e. vap's have very different locking
requirements though some of the same issues remain).

In this case the upcall must not hold the driver softc lock
regardless--it's got nothing to do with net80211, you'll hit LOR's when
you got through other mid-layer code.

> 
>>> Sorry for the late reply, and thanks for the patch.
>>> I needed to tweak the patch to make it compile, but seems to work
>>> fine. Modified patch attached.
>>>
>>> From: "Bjoern A. Zeeb" <bzeeb-lists-at-lists.zabbadoz.net>
>>> Date: Sat, 2 Dec 2006 12:25:28 +0000 (UTC)
>>>
>>> ::ok, whoever is going to take care of this and perhaps commit the
>>> :: fix let me know. I assigned it LOR ID 194 on "The LOR page":
>>> :: 	http://sources.zabbadoz.net/freebsd/lor.html#194
>>>
>>> LOR seems to be gone with the patch, so it could be removed when
>>> committed.
>>>
>>> Thanks,
>>>   Haro
>>>
>>> --- if_iwi.c.org	9 Nov 2006 16:05:43 -0000
>>> +++ if_iwi.c	3 Dec 2006 14:17:45 -0000
>>> _at__at_ -1230,6 +1230,7 _at__at_
>>>  	struct mbuf *mnew, *m;
>>>  	struct ieee80211_node *ni;
>>>  	int type, error, framelen;
>>> +	IWI_LOCK_DECL;
>> Hmm, why this?   If it's for spl's, note that the rest of the patch is
>> specific to 5.x and up, shouldn't really be dropping spl in 4.x.  Also,
>> even if it did, it would pass a garbage value to splx(), so it would
>> really break things badly.
> 
> No, this is a hack to work around a recursion in net80211.  There are a 
> couple of places where the driver calls back into net80211 with the lock 
> held (to protect the 'ic', see above).  net80211 then might call back 
> into the driver, which means we would have either have to use a recursive 
> lock (which can't be passed to msleep) or avoid the recursion at leat for 
> the lock (which is what IWI_LOCK_DECL does).  I know it's ugly and wrong, 
> but unless there is a propper sollution as to how the net80211 data 
> structures are protected in various call paths, it's not something that 
> can be fixed.

See above.  This is not about recursive locks or net80211; this is about
how to handle middle layer code in the network stack.  With direct
dispatch the  rx path is directly coupled to the tx path for traffic and
locking must compensate or there must be some other mechanism to
decouple the context.

> 
>>>  	framelen = le16toh(frame->len);
>>>  	if (framelen < IEEE80211_MIN_LEN || framelen > MCLBYTES) {
>>> _at__at_ -1310,6 +1311,7 _at__at_
>>>
>>>  		bpf_mtap2(sc->sc_drvbpf, tap, sc->sc_rxtap_len, m);
>>>  	}
>>> +	IWI_UNLOCK(sc);
>>>
>>>  	ni = ieee80211_find_rxnode(ic, mtod(m, struct ieee80211_frame_min
>>> *));
>>>
>>> _at__at_ -1319,6 +1321,7 _at__at_
>>>  	/* node is no longer needed */
>>>  	ieee80211_free_node(ni);
>>>
>>> +	IWI_LOCK(sc);
>>>  	if (sc->sc_softled) {
>>>  		/*
>>>  		 * Blink for any data frame.  Otherwise do a
>>>
>>> =--------------------------------------------------------------------
>>> ---------- _ _    Munehiro (haro) Matsuda
>>>  -|- /_\  |_|_|   Internet Solution Dept., KGT Inc.
>>>  /|\ |_|  |_|_|   2-8-8 Shinjuku Shinjuku-ku Tokyo 160-0022, Japan
>>>                   Tel: +81-3-3225-0767  Fax: +81-3-3225-0740
> 
Received on Mon Dec 04 2006 - 21:56:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC