Re: dev.bce.X.com_no_buffers increasing and packet loss

From: Ian FREISLICH <ianf_at_clue.co.za>
Date: Fri, 05 Mar 2010 20:16:31 +0200
Pyun YongHyeon wrote:
> On Fri, Mar 05, 2010 at 01:20:57PM +0200, Ian FREISLICH wrote:
> > Hi
> > 
> > I have a system that is experiencing mild to severe packet loss.
> > The interfaces are configured as follows:
> > 
> > lagg0: bce0, bce1, bce2, bce3  lagproto lacp
> > 
> > lagg0 then is used as the hwdev for the vlan interfaces.
> > 
> > I have pf with a few queues for bandwidth management.
> > 
> > There isn't that much traffic on it (200-500Mbit/s).
> > 
> > I see only the following suspect for packet loss:
> > 
> > dev.bce.0.com_no_buffers: 140151466
> > dev.bce.1.com_no_buffers: 514723247
> > dev.bce.2.com_no_buffers: 10454050
> > dev.bce.3.com_no_buffers: 369371
> > 
> > Most of the time, these numbers are static, but every once in a
> > while they increase massively by several thousand, but only on 2
> > interfaces.  The 1 minute average rate on those interfaces is 266/s
> > and 123/s.
> > 
> > Does anyone think this is related to the packet loss or are these
> > counters just a red herring?  Is there anything that can be done
> > to reduce this count?
> > 
> 
> I think this sysctl node indicates number of dropped frames in
> completion processor of NetXtreme II. The counter is incremented
> when the processor received a frame successfully but it couldn't
> pass the frame to system as there are no available RX buffers so
> completion processor dopped the received frame.
> If you see mbuf shortage from netstat that would be normal. But if
> system has a lot of free mbuf resources it may indicate other
> issue. bce(4) may not be able to replenish controller with RX
> buffer if system is suffering from high load.

I don't think I've ever seen an mbuf shortage on this host, and
load isn't that high, typically 12% CPU or 88% idle.  That's just
on 2 (of 16) cores busy.  There's tons of free memory (~12G) if I
need to increase the number of buffers available, but I'm not sure
which tunable to use to do that.  The routing table also isn't large
at about 4000 prefixes.

[firewall1.jnb1] ~ # netstat -m
4118/7147/11265 mbufs in use (current/cache/total)
3092/6850/9942/131072 mbuf clusters in use (current/cache/total/max)
2060/4212 mbuf+clusters out of packet secondary zone in use (current/cache)
0/678/678/65536 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/32768 9k jumbo clusters in use (current/cache/total/max)
0/0/0/16384 16k jumbo clusters in use (current/cache/total/max)
7214K/18198K/25412K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

I currently set the following in loader.conf:

net.isr.maxthreads="8"
net.isr.direct=0
if_igb_load="yes"
kern.ipc.nmbclusters="131072"
kern.maxusers="1024"

Ian

--
Ian Freislich
Received on Fri Mar 05 2010 - 17:16:47 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC