Re: Survey results very helpful, thanks!

From: Karl Denninger <karl_at_denninger.net>
Date: Mon, 08 Mar 2010 14:28:39 -0600
Doug Hardie wrote:
> On 8 March 2010, at 06:53, Robert Watson wrote:
>
>   
>> On Sun, 7 Mar 2010, Robert Watson wrote:
>>
>>     
>>> If your system shows a non-zero value, please send me a *private e-mail* with the output of that command, plus also the output of "sysctl kern.smp", "uptime", and a brief description of the workload and network interface configuration.  For example: it's a busy 8-core web server with roughly X connections/second, and that has three em network interfaces used to load balance from an upstream source.  IPSEC is used for management purposes (but not bulk traffic), and there's a local MySQL database.
>>>       
>> I've now received a number of reports that confirm our suspicion that the race does occur, albeit very rarely, and particularly on systems with many cores or multiple network interfaces.  Fixing it is definitely on the TODO for 9.0, both to improve our ability to do multiple virtual network stacks, but with an appropriately scalable fix in mind given our improved TCP scalability for 9.0 as well.
>>     
>
> I run a number of 4 core systems with em interfaces.  These are production systems that are unmanned and located a long way from me.  Under unusual conditions it can take up to 6 hours to get there.  I have been waiting to switch to 8.0 because of the discussions on the em device and now it sounds like I had better just skip 8.x and wait for 9.  7.2 is working just fine._______________________________________________
>   
I don't think its that simple.

I run a number of production systems with "em" interfaces, and they get
POUNDED.

None have had any trouble with 8.x.

$ ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 00:30:48:d2:5a:24
        inet 67.23.181.70 netmask 0xffffff00 broadcast 67.23.181.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

$ uptime
 3:27PM  up 61 days, 22:34, 1 user, load averages: 5.08, 4.48, 4.28

That's one of the busier ones; it's kinda loafing right now on network
I/O (running about 3mbps sustained) but typically operates in the
15-20mbps range to the wild wild net for 6-8 hours in the evening doing
what its doing now (handling a very busy forum) plus a few hundred
videocast streams....

The last reboot was to replace a power strip in the colo rack with one
that had remote management capability.  It hasn't actually crashed
since, well, pretty much forever (it was running 7.x before 8.x went to
production status)

The box is a dual quad-core Xeon running the amd64 codebase.

-- Karl
Received on Mon Mar 08 2010 - 20:06:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC