Re: LOR route vr0

From: Robert Watson <rwatson_at_FreeBSD.org> Date: Sat, 27 Aug 2005 18:44:51 +0100 (BST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:42 UTC

On Sat, 27 Aug 2005, M. Warner Losh wrote:

> : Generally speaking, network interface device driver locks follow network
> : stack locks in the lock order.  However, I've not really looked much at
> : the route table locking so can't speak to whether that is the case
> : specifically for routing locks.  If it is, the below traces reflect the
> : correct order, and you might want to add a hard-coded entry to witness in
> : order to catch the reverse order.
>
> Can you pose a quickie summary on how to do that? I tried last night and 
> was unsuccessful...

You need to add an entry to subr_witness.c creating a graph edge between 
the softc lock and the routing lock.  An example of an entry in 
subr_witness.c:

         /*
          * TCP/IP
          */
         { "tcp", &lock_class_mtx_sleep },
         { "tcpinp", &lock_class_mtx_sleep },
         { "so_snd", &lock_class_mtx_sleep },
         { NULL, NULL },

Note that sets of ordered entries are terminated with a double-null.  This 
declares that locks of type "tcp" preceed "tcpinp" which preceed 
"so_snd".

> : Lock order reversals between the
> : network stack and device drivers tend to occur as a result of the device
> : driver calling into the network stack while holding the device driver
> : mutex.
>
> I'm as sure as I can be that no locks are held when I call INTO the 
> network layer.  As far as I can tell, I only do that when I call 
> ifp->if_input, and I drop the locks to do that.

If I had to guess, you do a media status update, which can cause routing 
socket events indicating the link went up or down.

> : Someone (tm) should work out if the right order is route locks ->
> : device driver locks, as it's likely a common calss of bugs across many
> : drivers.
>
> I just discovered the problem in my code.  I'm not sure where the
> other order happens, but in my code I do the following:
>
> 	ED_LOCK(sc);
> 	ed_setrcr(sc);
> 	    ed_ds_getmcst(sc);
> 		IF_ADDR_LOCK(sc->ifp);
> 		TAILQ_FOREACH(ifma, &sc->ifp->if_multiaddrs, ifma_link) {
> 		...
> 		IF_ADDR_UNLOCK(sc->ifp);
> 	ED_UNLOCK(sc);
>
> since the lock for ED should be a leaf lock, this causes problems. I'm 
> guessing that the network layer calls into the driver with this lock 
> held.  Without hard coding the locking into witness (see above), I'm 
> unsure where this happens.  A quick grep of the code doesn't reveal 
> anything obvious...

I think this case should be OK, and we should document that as being the 
case using a hard-coded witness entry.

> When I comment out the abouve IF_ADDR locks, I have no more LORs, but I 
> think maybe other problems :-).

Hmmm.  I was thinking that it was a separate issue.  Could you try adding 
a graph edge to witness forcing the ifaddrmtx's to fall before the driver 
mutexes, in order to identify a path by which ifaddrmtx preceeds the 
driver mutex?

Robert N M Watson