Re: Call for testers: re(4) and RTL8168C/RTL8168CP/RTL8111C/RTL8111CP

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Mon, 30 Jun 2008 13:31:56 +0900
On Sat, Jun 28, 2008 at 06:54:47PM +0200, Dimitry Andric wrote:
 > On 2008-06-11 02:58, Pyun YongHyeon wrote:
 > >  > This seems to work better, although it still takes quite some time
 > >  > (~10s) for the interfaces to go up at boot time.  I haven't yet been
 > >  > able to get them "stuck", however, so that's good. :)
 > > Hmm, that's interesting. Can you spot where re(4) spends its time?
 > > Did RELENG_7 also have this issue?
 > 
 > Apparently it's experiencing timeouts, I usually get these:
 > 
 > re0: link state changed to DOWN
 > re0: watchdog timeout
        ^^^^^^^^^^^^^^^^
Because link state changed to DOWN re(4) should not queue
transmitting packets anymore until it get a valid link. Trying to
send further packets would cause watchdong timeouts as above.
This indicates re(4) failed to detect link loss event.
What makes me wonder is why the link state was changed to DOWN.
Do you have a clue(e.g. switching hub down etc)?

 > re0: 3 link states coalesced
        ^^^^^^^^^^^^^^^^^^^^^^^

Hmm, I guess you've encountered another bug. The link states
coalescing message indicates a bug in PHY driver and link state
handling of re(4). ATM the link state handling of re(4) is in very
bad state and it doesn't correctly drive MII_TICK. re(4) just relys
on link status change interrupt of controller but re(4) failed to
determine what's current link event is for (The event could be link
up or down or auto-negotiation complete etc). In addition, all
RealTek controllers lack proper programming interface to tell MAC
negotiated speed/duplex/flow-controls which in turn taking proper
action to the event very hard.

I guess re(4) should not rely on link status change interrupt but
it should fall back to traditional polling mechanism which will
enable correct tracking of link establishment. Also the link up/
down handling should be changed to process mii(4) posted events.
All these change requires a lot of code change and needs more
testing. I think I may have to commit accumulated patches for newer
RTL8168 family before going to that direction. The patch is not
perfect to address all issues for RTL8168 family but it allows
recognition of the new hardware and make it usable in most cases.

 > re0: link state changed to UP
 > re1: link state changed to DOWN
 > 
 > I've been running all tests under RELENG_7, btw.  Note also, these
 > delays don't always happen, in some cases the interfaces react very
 > quickly.  In rare cases, they don't work at all, until you manually
 > ifconfig down and up them a few times.
 > 
 > What's funny though, is that the interfaces seem to start in DOWN mode:
 > 
 > [...booting...]
 > Mounting local file systems:.
 > Setting hostname: tensor.andric.com.
 > re0: link state changed to DOWN
 > re1: link state changed to DOWN
 > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
 >         inet6 ::1 prefixlen 128
 >         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
 >         inet 127.0.0.1 netmask 0xff000000
 > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >         options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
 >         ether 00:30:18:a6:f1:a8
 >         inet6 fe80::230:18ff:fea6:f1a8%re0 prefixlen 64 tentative scopeid 0x1
 >         inet 87.251.56.140 netmask 0xffffffc0 broadcast 87.251.56.191
 >         media: Ethernet autoselect (none)
 >         status: no carrier
 > re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >         options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
 >         ether 00:30:18:a6:f1:a9
 >         inet6 fe80::230:18ff:fea6:f1a9%re1 prefixlen 64 tentative scopeid 0x2
 >         inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
 >         media: Ethernet autoselect (none)
 >         status: no carrier
 > [...more initialization...]
 > net.inet6.ip6.forwarding: 0 -> 1
 > net.inet6.ip6.accept_rtadv: 0 -> 0
 > re0: link state changed to UP
 > re1: link state changed to UP
 > 
 > and only then do they "really" go up... :)
 > 

I can't sure due to bugs in link state handling in driver but
generally it's normal. Establishing a link with link partner takes
time and sometimes it would even take 10 seconds or more.

 > Do you have any good suggestions on where I could put some debug
 > printfs in re to find out what it's timing out on?
 > 

Before doing that it would be more appropriate to fix link state
handing in driver. I'll let you know when I have a patch for link
handling clean-up.

 > 
 > > Plugging/unplugging UTP cable to ethernet controller during boot
 > > change the long delay? How about disabling WOL before system
 > > shutdown?(e.g. ifconfig re0 -wol)
 > 
 > Plugging/unplugging the cable doesn't seem to make much difference, and
 > neither does disabling WOL before shutdown (or altogether)...
 > 

Ok.

Thanks for reporting.
-- 
Regards,
Pyun YongHyeon
Received on Mon Jun 30 2008 - 02:34:09 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:32 UTC