Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?)

From: Doug Hardie <bc979_at_lafn.org>
Date: Tue, 9 Mar 2010 00:53:34 -0800
On 8 March 2010, at 12:33, Robert Watson wrote:

> 
> On Mon, 8 Mar 2010, Doug Hardie wrote:
> 
>> I run a number of 4 core systems with em interfaces.  These are production systems that are unmanned and located a long way from me.  Under unusual conditions it can take up to 6 hours to get there.  I have been waiting to switch to 8.0 because of the discussions on the em device and now it sounds like I had better just skip 8.x and wait for 9.  7.2 is working just fine.
> 
> Not sure that any information in this survey thread should be relevant to that decision.  This race has existed since before FreeBSD, having appeared in the original BSD network stack, and is just as present in FreeBSD 7.x as 8.x or 9.x.  When I learned about the race during the early 7.x development cycle, I added a counter/statistic to measure how much it happened in practice, but was not able to exercise it in my testing, and so left the counter in to appear in 7.0 and later so that we could perform this survey as core counts/etc increase.
> 
> The two likely outcomes were "it is never exercised" and "it is exercised but only very infrequently", neither really justifying the quite complex change to correct it given requirements at the time.  On-going development work on the virtual network stack is what justifies correcting the bug at this point, moving from detecting and handling the race to preventing it from occuring as an invariant.  The motivation here, BTW, is that we'd like to eliminate the type-stable storage requirement for connection state (which ensures that memory once used for a connection block is only ever used for connection blocks in the future), allowing memory to be fully freed when a virtual network stack is destroyed.  Using type-stable storage helped address this bug, but was primarily present to reduce the overhead of monitoring using netstat(1).  We'll now need to use a slightly more expensive solution (true reference counts) in that context, although in practice it will almost certainly be an unmeasurable cost.
> 
> Which is to say that while there might be something in the em/altq/... thread to reasonably lead you to avoid 8.0, nothing in the TCP timer race thread should do so, since it affects 7.2 just as much as 8.0.  Even if you do see a non-zero counter, that's not a matter for operational concern, just useful from the perspective of a network stack developer to understanding timing and behaviors in the stack.  :-)


Thanks for the complete explanation.  I don't believe the ALTQ issue will affect me.  I am not currently using it and do not expect to in the near future.  In addition, there was a posting that a fix for at least part of that will be added in a week or so.  Given all that it appears its time to start the planning/testing process for 8.
Received on Tue Mar 09 2010 - 07:53:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC