Re: dev.bce.X.com_no_buffers increasing and packet loss

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Fri, 5 Mar 2010 10:40:46 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC

On Fri, Mar 05, 2010 at 08:16:31PM +0200, Ian FREISLICH wrote:
> Pyun YongHyeon wrote:
> > On Fri, Mar 05, 2010 at 01:20:57PM +0200, Ian FREISLICH wrote:
> > > Hi
> > > 
> > > I have a system that is experiencing mild to severe packet loss.
> > > The interfaces are configured as follows:
> > > 
> > > lagg0: bce0, bce1, bce2, bce3  lagproto lacp
> > > 
> > > lagg0 then is used as the hwdev for the vlan interfaces.
> > > 
> > > I have pf with a few queues for bandwidth management.
> > > 
> > > There isn't that much traffic on it (200-500Mbit/s).
> > > 
> > > I see only the following suspect for packet loss:
> > > 
> > > dev.bce.0.com_no_buffers: 140151466
> > > dev.bce.1.com_no_buffers: 514723247
> > > dev.bce.2.com_no_buffers: 10454050
> > > dev.bce.3.com_no_buffers: 369371
> > > 
> > > Most of the time, these numbers are static, but every once in a
> > > while they increase massively by several thousand, but only on 2
> > > interfaces.  The 1 minute average rate on those interfaces is 266/s
> > > and 123/s.
> > > 
> > > Does anyone think this is related to the packet loss or are these
> > > counters just a red herring?  Is there anything that can be done
> > > to reduce this count?
> > > 
> > 
> > I think this sysctl node indicates number of dropped frames in
> > completion processor of NetXtreme II. The counter is incremented
> > when the processor received a frame successfully but it couldn't
> > pass the frame to system as there are no available RX buffers so
> > completion processor dopped the received frame.
> > If you see mbuf shortage from netstat that would be normal. But if
> > system has a lot of free mbuf resources it may indicate other
> > issue. bce(4) may not be able to replenish controller with RX
> > buffer if system is suffering from high load.
> 
> I don't think I've ever seen an mbuf shortage on this host, and
> load isn't that high, typically 12% CPU or 88% idle.  That's just
> on 2 (of 16) cores busy.  There's tons of free memory (~12G) if I
> need to increase the number of buffers available, but I'm not sure
> which tunable to use to do that.  The routing table also isn't large
> at about 4000 prefixes.
> 
> [firewall1.jnb1] ~ # netstat -m
> 4118/7147/11265 mbufs in use (current/cache/total)
> 3092/6850/9942/131072 mbuf clusters in use (current/cache/total/max)
> 2060/4212 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/678/678/65536 4k (page size) jumbo clusters in use (current/cache/total/max)
> 0/0/0/32768 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/16384 16k jumbo clusters in use (current/cache/total/max)
> 7214K/18198K/25412K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> I currently set the following in loader.conf:
> 
> net.isr.maxthreads="8"
> net.isr.direct=0
> if_igb_load="yes"
> kern.ipc.nmbclusters="131072"
> kern.maxusers="1024"
> 

Would you show me the output of dmesg(bce(4)/brgphy(4) only) and
the output of "pciconf -lcbv" for the controller?