(unknown charset) Re: if_ral regression

From: (unknown charset) Weongyo Jeong <weongyo.jeong_at_gmail.com>
Date: Wed, 2 Jan 2008 11:38:31 +0900
On Tue, Jan 01, 2008 at 02:27:47PM +0800, Sepherosa Ziehau wrote:
> On Dec 29, 2007 8:33 PM, Dag-Erling Smørgrav <des_at_des.no> wrote:
> > I upgraded my router cum firewall cum access point (soekris net4801 with
> > a cheap third-party ralink-based wlan adapter) from RELENG_6 to HEAD and
> > noticed what seems to be a regression in if_ral.  After a certain amount
> > of use (i.e. actually having a client connected to it and transferring
> > data), the connection falters, and eventually the client can no longer
> > see even see the access point in a scan.  Restarting the interface on
> > the router (/etc/rc.d/netif restart ral0) fixes it.  I now have a cron
> > job that does this every five minutes.  I still get occasional outages,
> > but all I have to do is wait a few minutes for the cron job to kick in.
> >
> > Outages are clearly related to traffic; a sure-fire way to trigger one
> > is to start a backup job on my laptop (rsync to my file server).  I will
> > lose the wlan connection repeatedly until I either stop trying or run
> > the script with a bandwidth limit.
> >
> > des_at_soe ~% uname -a
> > FreeBSD soe.des.no 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Dec 15 20:46:29 UTC 2007     des_at_pwd.des.no:/usr/obj/usr/src/sys/soe  i386
> > des_at_soe ~% kldstat -v
> > Id Refs Address    Size     Name
> >  1   18 0xc0400000 33fdfc   kernel (/boot/soe/kernel)
> >  2    1 0xc0740000 7690     if_sis.ko (/boot/soe/if_sis.ko)
> >  3    2 0xc0748000 1dbe0    miibus.ko (/boot/soe/miibus.ko)
> >  4    1 0xc0766000 18e28    if_ral.ko (/boot/soe/if_ral.ko)
> >  5    4 0xc077f000 2a95c    wlan.ko (/boot/soe/wlan.ko)
> >  6    1 0xc07aa000 2cb0     wlan_acl.ko (/boot/soe/wlan_acl.ko)
> >  7    1 0xc07ad000 1924     wlan_scan_ap.ko (/boot/soe/wlan_scan_ap.ko)
> >  8    1 0xc107f000 6000     geom_md.ko (/boot/soe/geom_md.ko)
> >  9    1 0xc10f9000 2000     pflog.ko (/boot/soe/pflog.ko)
> > 10    1 0xc10fb000 2f000    pf.ko (/boot/soe/pf.ko)
> > 11    4 0xc118d000 a000     netgraph.ko (/boot/soe/netgraph.ko)
> > 12    1 0xc119c000 3000     ng_ether.ko (/boot/soe/ng_ether.ko)
> > 13    1 0xc11a8000 5000     ng_pppoe.ko (/boot/soe/ng_pppoe.ko)
> > 14    1 0xc11ad000 4000     ng_socket.ko (/boot/soe/ng_socket.ko)
> > des_at_soe ~% grep ral0 /var/run/dmesg.boot
> > ral0: <Ralink Technology RT2560> mem 0xa0004000-0xa0005fff irq 11 at device 10.0 on pci0
> 
> I don't whether following thingies will fix your problem:
> 
> 1)
> rt2560.c: rt2560_setup_tx_desc()
> Set RT2560_{TX,TX_CIPHER}_BUSY desc flag at the end of this function,
> instead of at the beginning of this function.  The original way _may_
> confuse hardware encryption/tx engine.
> 
> 2)
> And the rt2560_bbp_read() is not correct, it should look like following:
> static uint8_t
> rt2560_bbp_read(struct rt2560_softc *sc, uint8_t reg)
> {
> 	uint32_t val;
> 	int ntries;
> 
> 	for (ntries = 0; ntries < 100; ntries++) {
> 		if (!(RAL_READ(sc, RT2560_BBPCSR) & RT2560_BBP_BUSY))
> 			break;
> 		DELAY(1);
> 	}
> 	if (ntries == 100) {
> 		device_printf(sc->sc_dev, "could not read from BBP\n");
> 		return 0;
> 	}
> 
> 	val = RT2560_BBP_BUSY | reg << 8;
> 	RAL_WRITE(sc, RT2560_BBPCSR, val);
> 
> 	for (ntries = 0; ntries < 100; ntries++) {
> 		val = RAL_READ(sc, RT2560_BBPCSR);
> 		if (!(val & RT2560_BBP_BUSY))
> 			return val & 0xff;
> 		DELAY(1);
> 	}
> 
> 	device_printf(sc->sc_dev, "could not read from BBP\n");
> 	return 0;
> }
> 
> 3)
> After above fix,
> rt2560_set_txantenna() and rt2560_set_rxantenna() should be called
> after rt2560_bbp_init(), since above two function touch BBP.  NOTE:
> without above fix, you may burn your card.
> 
> Even with these in place in dfly, I still have strange TX performance
> regression in sta mode (drop from 20Mb/s to 3Mb/s under very well
> condition) on certain hardwares after 20sec~30sec TCP_STREAM netperf
> testing; didn't have enough time to dig, however, all of the tested
> hardwares stayed connected during testing (I usually run netperf
> stream test for 12 hours or more).

I also saw some regression in TX performance during porting malo(4).
Problems were fixed after removing following lines in *_start:

			/*
			 * Cancel any background scan.
			 */
			if (ic->ic_flags & IEEE80211_F_SCAN)
				ieee80211_cancel_scan(ic);

and (optionally)

		if (m->m_flags & M_TXCB)
			...
			ieee80211_process_callback(ni, m, 0);	/* XXX status?
			...

I tested in malo(4) only not in other devices so I can't sure that this
would fix regression but it worked well after patching.

However, I know that this workaround isn't a fundamental sulution to
fix this problem.

regards,
Weongyo Jeong
Received on Wed Jan 02 2008 - 02:07:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:24 UTC