Re: FreeBSD 8.0 - network stack crashes?

From: Eirik Øverby <ltning_at_anduin.net> Date: Mon, 30 Nov 2009 09:26:00 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC

On 30. nov. 2009, at 08.47, Adrian Chadd wrote:

> That URL works for me. So how much traffic is this box handling during
> peak times?

Depends how you define load. It's a storage box (14TB ZFS) with a small handful of NFS clients pushing backup data to it .. So lots of traffic in bytes/sec, but not many clients.

> I've seen this on the proxy boxes that I've setup. There's a lot of
> data being tied up in socket buffers as well as being routed between
> interfaces (ie, stuff that isn't being intercepted.)  Take a look at
> "netstat -an" when things are locked up; see if there's any sockets
> which have full send/receive queues.

If you're referring to the Send-Q and Recv-Q values, they are zero everywhere I can tell.

> I'm going to take a complete stab in the dark here and say this sounds
> a little like a livelock. Ie, something is queuing data and allocating
> mbufs for TX (and something else is generating mbufs - I dunno, packet
> headers?) far faster than the NIC is able to TX them out, and there's
> not enough backpressure on whatever (say, the stuff filling socket
> buffers) to stop the mbuf exhaustion. Again, I've seen this kind of
> crap on proxy boxes.

Not sure if this applies in our case. See the (very) end of this mail for some debug/stats output from em1 (the interface currently in use; I disabled lagg/lacp to ease debugging).

> See if you have full socket buffers showing up in netstat -an. Have
> you tweaked the socket/TCP send/receive sizes? I typically lock mine
> down to something small (32k-64k for the most part) so I don't hit
> mbuf exhaustion on very busy proxies.

I haven't touched any defaults except the mbuf clusters. What does your sysctl.conf look like?

Thanks,
/Eirik

> 2c,
> 
> 
> 
> Adrian
> 
> 2009/11/30 Eirik Øverby <ltning_at_anduin.net>:
>> On 29. nov. 2009, at 15.29, Robert Watson wrote:
>> 
>>> On Sun, 29 Nov 2009, Eirik Øverby wrote:
>>> 
>>>> I just did that (-rxcsum -txcsum -tso), but the numbers still keep rising. I'll wait and see if it goes down again, then reboot with those values to see how it behaves. But right away it doesn't look too good ..
>>> 
>>> It would be interesting to know if any of the counters in the output of netstat -s grow linearly with the allocation count in netstat -m.  Often times leaks are associated with edge cases in the stack (typically because if they are in common cases the bug is detected really quickly!) -- usually error handling, where in some error case the unwinding fails to free an mbuf that it should free.  These are notoriously hard to track down, unfortunately, but the stats output (especially where delta alloc is linear to delta stat) may inform the situation some more.
>> 
>> From what I can tell, all that goes up with mbuf usage is traffic/packet counts. I can't say I see anything fishy in there.
>> 
>> From the last few samples in
>> http://anduin.net/~ltning/netstat.log
>> you can see the host stops receiving any packets, but does a few retransmits before the session where this script ran timed out.
>> 
>> /Eirik
>> 
>> _______________________________________________
>> freebsd-current_at_freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>> 
> 

em1: link state changed to UP
em1: Adapter hardware address = 0xffffff80003be530 
em1: CTRL = 0x140248 RCTL = 0x8002 
em1: Packet buffer = Tx=20k Rx=12k 
em1: Flow control watermarks high = 10240 low = 8740
em1: tx_int_delay = 66, tx_abs_int_delay = 66
em1: rx_int_delay = 32, rx_abs_int_delay = 66
em1: fifo workaround = 0, fifo_reset_count = 0
em1: hw tdh = 25, hw tdt = 25
em1: hw rdh = 222, hw rdt = 221
em1: Num Tx descriptors avail = 256
em1: Tx Descriptors not avail1 = 0
em1: Tx Descriptors not avail2 = 0
em1: Std mbuf failed = 0
em1: Std mbuf cluster failed = 0
em1: Driver dropped packets = 0
em1: Driver tx dma failure in encap = 0
em1: Excessive collisions = 0
em1: Sequence errors = 0
em1: Defer count = 0
em1: Missed Packets = 0
em1: Receive No Buffers = 0
em1: Receive Length Errors = 0
em1: Receive errors = 0
em1: Crc errors = 0
em1: Alignment errors = 0
em1: Collision/Carrier extension errors = 0
em1: RX overruns = 0
em1: watchdog timeouts = 0
em1: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0
em1: XON Rcvd = 0
em1: XON Xmtd = 0
em1: XOFF Rcvd = 0
em1: XOFF Xmtd = 0
em1: Good Packets Rcvd = 5704113
em1: Good Packets Xmtd = 3617612
em1: TSO Contexts Xmtd = 0
em1: TSO Contexts Failed = 0