Re: [BETA7-panic] sodealloc(): so_count 1

From: Robert Watson <rwatson_at_freebsd.org>
Date: Mon, 18 Oct 2004 18:24:06 -0400 (EDT)
On Sun, 17 Oct 2004, Vlad wrote:

> is there a specific condition when that happens? I tried to simulate
> heavy tcp traffic from number of sources but could not induct the panic
> by such artificial traffic. It happened to me only in 'natural' way ;) 
> 
> so maybe if you know exactly how to trigger it, and share that with us,
> we could do some workaround on live production servers so it doesn't
> happen, until it's fixed in the code? 

I've merged a likely fix to the problem to HEAD as of a minute or two ago,
which broadens the scope of the accept mutex to reduce the opportunity for
races (it both expands the coverage to some additional reference
operations, and also avoids dropping a lock to reorder).  With this change
in place, I'm no longer able to easily reproduce the problem -- I've had a
couple of SMP boxes running for an hour or two trying without success.
Previously I had reproduction time with just the right traffic down to a
second or two.  I'll merge the fix to RELENG_5 shortly for merge to
RELENG_5_3 before 5.3 goes out the door.  Obviously, any help in getting
testing exposure for this change, as it comes very late in the release
cycle, would be most welcome.  A copy of the patch can be found at:

    http://www.watson.org/~robert/freebsd/netperf/20041018-sofree-race-fix.diff

A complete description can be found in the commit message.  Thanks to
everyone who has helped diagnosis and fix this!  Hopefully we've got the
right fix now, although obviously as the next few days of testing play
out, we'll see.

Thanks,

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research


> 
> 
> > The good news and the bad news: after spending a day or two hacking up an
> > IP stack simulator to simulate various nasty combinations of TCP packets,
> > I've managed to reproduce the problem, and am able to get a core.  I'm
> > currently working on tracking down the problem.
> 
> -- 
> Vlad
> 
Received on Mon Oct 18 2004 - 20:24:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:18 UTC