3 show-stopper issues with 9-BETA3

From: Ian FREISLICH <ianf_at_clue.co.za>
Date: Wed, 05 Oct 2011 16:30:23 +0200
Hi

In no particular order:

1. bce(4) transmit and recieve ring buffer overruns
	On a moderately busy router with a full BGP table and
	aggregate throughput of between 200mbps and 800mbps, I get
	these buffer overruns at an average rate of 28 per second
	on the busiest interface.

	[firewall1.jnb1] ~ # sysctl dev.bce |grep com_no_buffers
	dev.bce.0.com_no_buffers: 101
	dev.bce.1.com_no_buffers: 0
	dev.bce.2.com_no_buffers: 32547
	dev.bce.3.com_no_buffers: 444
	
	I've tried increasing the TX_PAGES and RX_PAGES in
	sys/dev/bce/if_bcereg.h as I've done in the past (to 64)
	which is what resolved this problem on 8.2-STABLE to no avail.
	It appears that there is a hard limit of 8 according to
	bce_set_tunables() in if_bce.c.  But no values to hw.bce.tx_pages
	and hw.bce.rx_pages makes the slightest difference.

2. carp(4) on my backup router randomly takes over MASTER on the
	standby host, but when ifconfig claims the carp interface
	is master tcpdump shows that it's not broadcasting its
	advertisement.  The actual master still broadcasts and no
	setting of advskew or advbase changes the 9-BETA host's
	idea of who is actually master.  I have to reboot the host
	to reset the carp interfaces.  destroying and re-creating
	them just brings them up as backup for about a second and
	then they regress to master.

3. PF doesn't expire state. The state table on my older host (pre
	OpenBSD-4.5) has the following stats:

	Status: Enabled for 0 days 00:37:17           Debug: Urgent
	State Table                          Total             Rate
	  current entries                   169546               
	  searches                        94387451        42193.8/s
	  inserts                          4012389         1793.6/s
	  removals                         3842843         1717.9/s

	The 9-BETA3 host's current entries exactly match the number
	of inserts until it hits the hard limit of 1.5M entries and
	can add no more.  It takes about 10 minutes to fill up and
	then no new flows are routed.

We're in a quiet period at the moment, so I can keep a 9-X host
around for a few days.  I'll be able to try things until I have to
downgrade the other host at the end of the week.  Incompatibility
between pf on 8.2-STABLE and 9-X after 2011-06-28 makes testing a
little difficult though because I'm not able to synchronise state.

FWIW, the tuning that has been done eliminates the issue on 8.2-STABLE:
[firewall1.jnb1] ~ # cat /boot/loader.conf 
net.isr.maxthreads="8"
net.isr.defaultqlimit="4096"
net.isr.maxqlimit="81920"
net.isr.direct="1"
kern.ipc.nmbclusters="262144"
kern.maxusers="1024"

[firewall1.jnb1] ~ # cat /etc/sysctl.conf 
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.ip.fastforwarding=1
net.inet.carp.preempt=1
net.inet.icmp.icmplim_output=0
net.inet.icmp.icmplim=0
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
net.route.netisr_maxqlen=8192

diff -u -d -r1.26.2.7 if_bcereg.h
--- if_bcereg.h 15 Aug 2010 23:56:57 -0000      1.26.2.7
+++ if_bcereg.h 5 Oct 2011 14:29:15 -0000
_at__at_ -6150,7 +6150,7 _at__at_
  * Page count must remain a power of 2 for all
  * of the math to work correctly.
  */
-#define TX_PAGES       2
+#define TX_PAGES       64
 #define TOTAL_TX_BD_PER_PAGE  (BCM_PAGE_SIZE / sizeof(struct tx_bd))
 #define USABLE_TX_BD_PER_PAGE (TOTAL_TX_BD_PER_PAGE - 1)
 #define TOTAL_TX_BD (TOTAL_TX_BD_PER_PAGE * TX_PAGES)
_at__at_ -6170,7 +6170,7 _at__at_
  * Page count must remain a power of 2 for all
  * of the math to work correctly.
  */
-#define RX_PAGES       2
+#define RX_PAGES       64
 #define TOTAL_RX_BD_PER_PAGE  (BCM_PAGE_SIZE / sizeof(struct rx_bd))
 #define USABLE_RX_BD_PER_PAGE (TOTAL_RX_BD_PER_PAGE - 1)
 #define TOTAL_RX_BD (TOTAL_RX_BD_PER_PAGE * RX_PAGES)

Ian

-- 
Ian Freislich
Received on Wed Oct 05 2011 - 13:11:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:18 UTC