bge driver autoneg failure and system-wide stalls

From: Emil Mikulic <emil_at_cs.rmit.edu.au>
Date: Fri, 25 Nov 2005 13:20:41 +1100
I have a network port with bad wiring in the walls - a cable tester
shows only wires 1,2,3 and 6 are actually connected.

My solution is to patch directly into the switch, in which case the bge
driver works just fine.  However, the bad wiring exposes two problems
with the bge driver in 7-CURRENT.  From memory, I think these turned up
in the 5.x line because I wasn't seeing either issue in 4.x

The first problem is that, once ifconfig'd to an IP address, there will
be periodic system-wide stalls.  They generally last a little under a
second and are incredibly annoying and can cause keypresses to be lost
at the console.

I instrumented the kernel and, as far as I can tell, once ifconfig'd,
the following will happen in brgphy (mii module):

Every second there is a call to brgphy_service() with cmd=MII_TICK.
Every five seconds, this function will call brgphy_mii_phy_auto().
This function calls brgphy_loop().

In brgphy_loop(), there is a #if 0'd bit of code that device_printf()'s
how many times it looped.  I enabled it.

Sometimes it reports zero loops - when this happens there is no stall.
On a very pronounced stall, there will be between 3000-7000 loops.

(i.e. the stalls appear a bit random because they only get a chance to
happen once every five seconds, and sometimes brgphy_loop() doesn't
result in a noticeable stall)

The other problem is that bge will never negotiate a working link speed.
ifconfig will always return "status: no carrier"

If I force the media to 10baseT/UTP or 100baseTX (either mediaopt
full-duplex or not), it will issue a couple more MII_TICKs then stop,
ifconfig will return "status: active", there will be no more stalls,
and, most importantly, the network connection will actually work.

Is this fixable and actually worth fixing?

--Emil
Received on Fri Nov 25 2005 - 01:21:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC