Re: Call for testers: re(4) and RTL8168C/RTL8168CP/RTL8111C/RTL8111CP

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Mon, 14 Jul 2008 10:35:19 +0900
On Mon, Jun 30, 2008 at 01:31:56PM +0900, To Dimitry Andric wrote:
 > On Sat, Jun 28, 2008 at 06:54:47PM +0200, Dimitry Andric wrote:
 >  > On 2008-06-11 02:58, Pyun YongHyeon wrote:
 >  > >  > This seems to work better, although it still takes quite some time
 >  > >  > (~10s) for the interfaces to go up at boot time.  I haven't yet been
 >  > >  > able to get them "stuck", however, so that's good. :)
 >  > > Hmm, that's interesting. Can you spot where re(4) spends its time?
 >  > > Did RELENG_7 also have this issue?
 >  > 
 >  > Apparently it's experiencing timeouts, I usually get these:
 >  > 
 >  > re0: link state changed to DOWN
 >  > re0: watchdog timeout
 >         ^^^^^^^^^^^^^^^^
 > Because link state changed to DOWN re(4) should not queue
 > transmitting packets anymore until it get a valid link. Trying to
 > send further packets would cause watchdong timeouts as above.
 > This indicates re(4) failed to detect link loss event.
 > What makes me wonder is why the link state was changed to DOWN.
 > Do you have a clue(e.g. switching hub down etc)?
 > 
 >  > re0: 3 link states coalesced
 >         ^^^^^^^^^^^^^^^^^^^^^^^
 > 
 > Hmm, I guess you've encountered another bug. The link states
 > coalescing message indicates a bug in PHY driver and link state
 > handling of re(4). ATM the link state handling of re(4) is in very
 > bad state and it doesn't correctly drive MII_TICK. re(4) just relys
 > on link status change interrupt of controller but re(4) failed to
 > determine what's current link event is for (The event could be link
 > up or down or auto-negotiation complete etc). In addition, all
 > RealTek controllers lack proper programming interface to tell MAC
 > negotiated speed/duplex/flow-controls which in turn taking proper
 > action to the event very hard.
 > 
 > I guess re(4) should not rely on link status change interrupt but
 > it should fall back to traditional polling mechanism which will
 > enable correct tracking of link establishment. Also the link up/
 > down handling should be changed to process mii(4) posted events.
 > All these change requires a lot of code change and needs more
 > testing. I think I may have to commit accumulated patches for newer
 > RTL8168 family before going to that direction. The patch is not
 > perfect to address all issues for RTL8168 family but it allows
 > recognition of the new hardware and make it usable in most cases.
 > 
 >  > re0: link state changed to UP
 >  > re1: link state changed to DOWN
 >  > 
 >  > I've been running all tests under RELENG_7, btw.  Note also, these
 >  > delays don't always happen, in some cases the interfaces react very
 >  > quickly.  In rare cases, they don't work at all, until you manually
 >  > ifconfig down and up them a few times.
 >  > 
 >  > What's funny though, is that the interfaces seem to start in DOWN mode:
 >  > 
 >  > [...booting...]
 >  > Mounting local file systems:.
 >  > Setting hostname: tensor.andric.com.
 >  > re0: link state changed to DOWN
 >  > re1: link state changed to DOWN
 >  > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
 >  >         inet6 ::1 prefixlen 128
 >  >         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
 >  >         inet 127.0.0.1 netmask 0xff000000
 >  > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >  >         options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
 >  >         ether 00:30:18:a6:f1:a8
 >  >         inet6 fe80::230:18ff:fea6:f1a8%re0 prefixlen 64 tentative scopeid 0x1
 >  >         inet 87.251.56.140 netmask 0xffffffc0 broadcast 87.251.56.191
 >  >         media: Ethernet autoselect (none)
 >  >         status: no carrier
 >  > re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >  >         options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
 >  >         ether 00:30:18:a6:f1:a9
 >  >         inet6 fe80::230:18ff:fea6:f1a9%re1 prefixlen 64 tentative scopeid 0x2
 >  >         inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
 >  >         media: Ethernet autoselect (none)
 >  >         status: no carrier
 >  > [...more initialization...]
 >  > net.inet6.ip6.forwarding: 0 -> 1
 >  > net.inet6.ip6.accept_rtadv: 0 -> 0
 >  > re0: link state changed to UP
 >  > re1: link state changed to UP
 >  > 
 >  > and only then do they "really" go up... :)
 >  > 
 > 
 > I can't sure due to bugs in link state handling in driver but
 > generally it's normal. Establishing a link with link partner takes
 > time and sometimes it would even take 10 seconds or more.
 > 
 >  > Do you have any good suggestions on where I could put some debug
 >  > printfs in re to find out what it's timing out on?
 >  > 
 > 
 > Before doing that it would be more appropriate to fix link state
 > handing in driver. I'll let you know when I have a patch for link
 > handling clean-up.
 > 

Here is patch for re(4) link handling.
Copy if_re.c and if_rlreg.h from HEAD to RELENG_7 and apply
attached one. If you still see watchdog timeouts, please turn off
TSO and let me know how it goes.
One user reported TSO issues on 8169 family controllers but I
can't reproduce this on my 8169 hardware so it could be related
with silicon bug of sepecific revision of the hardware.

 >  > 
 >  > > Plugging/unplugging UTP cable to ethernet controller during boot
 >  > > change the long delay? How about disabling WOL before system
 >  > > shutdown?(e.g. ifconfig re0 -wol)
 >  > 
 >  > Plugging/unplugging the cable doesn't seem to make much difference, and
 >  > neither does disabling WOL before shutdown (or altogether)...
 >  > 
 > 
 > Ok.
 > 
 > Thanks for reporting.

-- 
Regards,
Pyun YongHyeon
Received on Sun Jul 13 2008 - 23:37:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:32 UTC