Re: 6.0BETA3 panic in ip_output (vlan/RIP related?)

From: Robert Watson <rwatson_at_FreeBSD.org> Date: Mon, 5 Sep 2005 14:26:39 +0100 (BST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:42 UTC

On Sat, 3 Sep 2005, Robert Watson wrote:

> I believe I've chatted with Gleb about this some, but want to confirm 
> that I understand the problem here: this occurs when an interface is 
> removed while IP multicast membership is still present for multicast 
> groups on the interface.  When the multicast socket is closed, then the 
> kernel panics because it has a now invalid cached pointer to the 
> interface structure (now freed), which cases an assertion failure 
> because the mutex code detects that it is operating on an invalid mutex.
>
> So it sounds like we need to figure out how the multicast code should 
> behave on interface removal -- I wonder what other operating systems do 
> here?  Do they simply invalidate current membership related with the 
> interface, or do they leave the multicast sockets in a state such that 
> if the interface comes back, the memberships are re-bound?

I've now committed a regression test for this bug:

     src/tools/regression/netinet/msocket_ifnet_remove

Which basically simulates the removal of an interface while in use for 
multicast, resulting in a similar panic to the one of the ones you've 
reported.  An if_disc discard interface is used.  It tests both raw and 
UDP socket variants, and should panic 6.x and 7.x boxes; it may panic 4.x 
and 5.x, but may just corrupt kernel memory silently.

I believe the solution for now is that on ifnet tear-down, we will need to 
walk the various pcb lists and trim references to the multicast address. 
I chatted a little with Bill Fenner today about what the application 
semantics should be, and likely we need to substantially change the way 
IPv4 and IPv6 multicast handle group membership for sockets in order to 
get the "right" behavior, so a panic work-around for 6.0 is the right 
thing to do, even though it won't be the final answer.

I should have an opportunity to look into a possible solution for this in 
the next few days.

Robert N M Watson