On Wed, Mar 10, 2010 at 11:11:13AM -0800, David Christensen wrote: > > > What's the traffic look like? Jumbo, standard, short > > frames? Any=20 > > > good ideas on profiling the code? I haven't figured out how to use > > > the CPU TSC but there is a free running timer on the device > > that might > > > be usable to calculate where the driver's time is spent. > > > > It looks like the traffic that provoked it was this: > > > > 10:18:42.319370 IP X.4569 > X.4569: UDP, length 12 > > 10:18:42.319402 IP X.4569 > X.4569: UDP, length 12 > > 10:18:42.319438 IP X.4569 > X.4569: UDP, length 12 > > 10:18:42.319484 IP X.4569 > X.4569: UDP, length 12 > > 10:18:42.319517 IP X.4569 > X.4569: UDP, length 12 > > > > A flurry of UDP tinygrams on an IAX2 trunk. The packet rate > > isn't spectacular at about 30kpps which on top of the base > > load of 60kpps still isn't a fantastic packet rate. The > > interesting thing is that while this storm was inprogress, it > > almost entirely excluded other traffic on the network. > > Ok, small packet performance is involved, this narrows down > the range of problems. The current design of bce_rx_intr() > attempts to process all RX frames in the receive ring. After > all available frames have been processed then the function > will attempt to refill the ring with new buffers. It's > likely that there's a long gap between the time the last > receive buffer is consumed and the time the RX ring is > refilled and the buffers are posted to the hardware, causing > a burst of dropped frames and the com_no_buffers firmware > counter to increment. > I successfully reproduced the issue with netperf on BCM5709. You can use UDP frame size 1 to trigger the issue. > Changing the high level design of bce_rx_intr() and > bce_rx_fill_chain() slightly to post a new buffer as each > frame is passed to the OS would likely avoid these gaps > during bursts of small frames but I'm not sure whether > they'll have a negative impact on the more common case of > streams of MTU sized frames. I've considered this in the > past but never coded the change and tested the resulting > performance. > I guess this may slightly increase performance with additional bus_dma(9) overheads but I think one of reason of dropping frames under heavy UDP frames may come from lack of free RX descriptors. Because bce(4) just uses a single RX ring so the number of available RX buffers would be 512. However it seems it's not possible to increase the number of RX buffers per RX ring so the next possible approach would be switching to use multiple RX rings with RSS. Even though FreeBSD does not dynamically adjust loads among CPUs I guess using RSS would be the way to go. > Does anyone have some experience with one method over > the other? > > DaveReceived on Wed Mar 10 2010 - 18:52:12 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC