Re: Call for testers: re(4) and RTL8168C/RTL8168CP/RTL8111C/RTL8111CP

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Mon, 14 Jul 2008 10:36:42 +0900
On Mon, Jul 14, 2008 at 10:35:19AM +0900, To Dimitry Andric wrote:
 > On Mon, Jun 30, 2008 at 01:31:56PM +0900, To Dimitry Andric wrote:
 >  > On Sat, Jun 28, 2008 at 06:54:47PM +0200, Dimitry Andric wrote:
 >  >  > On 2008-06-11 02:58, Pyun YongHyeon wrote:
 >  >  > >  > This seems to work better, although it still takes quite some time
 >  >  > >  > (~10s) for the interfaces to go up at boot time.  I haven't yet been
 >  >  > >  > able to get them "stuck", however, so that's good. :)
 >  >  > > Hmm, that's interesting. Can you spot where re(4) spends its time?
 >  >  > > Did RELENG_7 also have this issue?
 >  >  > 
 >  >  > Apparently it's experiencing timeouts, I usually get these:
 >  >  > 
 >  >  > re0: link state changed to DOWN
 >  >  > re0: watchdog timeout
 >  >         ^^^^^^^^^^^^^^^^
 >  > Because link state changed to DOWN re(4) should not queue
 >  > transmitting packets anymore until it get a valid link. Trying to
 >  > send further packets would cause watchdong timeouts as above.
 >  > This indicates re(4) failed to detect link loss event.
 >  > What makes me wonder is why the link state was changed to DOWN.
 >  > Do you have a clue(e.g. switching hub down etc)?
 >  > 
 >  >  > re0: 3 link states coalesced
 >  >         ^^^^^^^^^^^^^^^^^^^^^^^
 >  > 
 >  > Hmm, I guess you've encountered another bug. The link states
 >  > coalescing message indicates a bug in PHY driver and link state
 >  > handling of re(4). ATM the link state handling of re(4) is in very
 >  > bad state and it doesn't correctly drive MII_TICK. re(4) just relys
 >  > on link status change interrupt of controller but re(4) failed to
 >  > determine what's current link event is for (The event could be link
 >  > up or down or auto-negotiation complete etc). In addition, all
 >  > RealTek controllers lack proper programming interface to tell MAC
 >  > negotiated speed/duplex/flow-controls which in turn taking proper
 >  > action to the event very hard.
 >  > 
 >  > I guess re(4) should not rely on link status change interrupt but
 >  > it should fall back to traditional polling mechanism which will
 >  > enable correct tracking of link establishment. Also the link up/
 >  > down handling should be changed to process mii(4) posted events.
 >  > All these change requires a lot of code change and needs more
 >  > testing. I think I may have to commit accumulated patches for newer
 >  > RTL8168 family before going to that direction. The patch is not
 >  > perfect to address all issues for RTL8168 family but it allows
 >  > recognition of the new hardware and make it usable in most cases.
 >  > 
 >  >  > re0: link state changed to UP
 >  >  > re1: link state changed to DOWN
 >  >  > 
 >  >  > I've been running all tests under RELENG_7, btw.  Note also, these
 >  >  > delays don't always happen, in some cases the interfaces react very
 >  >  > quickly.  In rare cases, they don't work at all, until you manually
 >  >  > ifconfig down and up them a few times.
 >  >  > 
 >  >  > What's funny though, is that the interfaces seem to start in DOWN mode:
 >  >  > 
 >  >  > [...booting...]
 >  >  > Mounting local file systems:.
 >  >  > Setting hostname: tensor.andric.com.
 >  >  > re0: link state changed to DOWN
 >  >  > re1: link state changed to DOWN
 >  >  > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
 >  >  >         inet6 ::1 prefixlen 128
 >  >  >         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
 >  >  >         inet 127.0.0.1 netmask 0xff000000
 >  >  > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >  >  >         options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
 >  >  >         ether 00:30:18:a6:f1:a8
 >  >  >         inet6 fe80::230:18ff:fea6:f1a8%re0 prefixlen 64 tentative scopeid 0x1
 >  >  >         inet 87.251.56.140 netmask 0xffffffc0 broadcast 87.251.56.191
 >  >  >         media: Ethernet autoselect (none)
 >  >  >         status: no carrier
 >  >  > re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
 >  >  >         options=399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
 >  >  >         ether 00:30:18:a6:f1:a9
 >  >  >         inet6 fe80::230:18ff:fea6:f1a9%re1 prefixlen 64 tentative scopeid 0x2
 >  >  >         inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
 >  >  >         media: Ethernet autoselect (none)
 >  >  >         status: no carrier
 >  >  > [...more initialization...]
 >  >  > net.inet6.ip6.forwarding: 0 -> 1
 >  >  > net.inet6.ip6.accept_rtadv: 0 -> 0
 >  >  > re0: link state changed to UP
 >  >  > re1: link state changed to UP
 >  >  > 
 >  >  > and only then do they "really" go up... :)
 >  >  > 
 >  > 
 >  > I can't sure due to bugs in link state handling in driver but
 >  > generally it's normal. Establishing a link with link partner takes
 >  > time and sometimes it would even take 10 seconds or more.
 >  > 
 >  >  > Do you have any good suggestions on where I could put some debug
 >  >  > printfs in re to find out what it's timing out on?
 >  >  > 
 >  > 
 >  > Before doing that it would be more appropriate to fix link state
 >  > handing in driver. I'll let you know when I have a patch for link
 >  > handling clean-up.
 >  > 
 > 
 > Here is patch for re(4) link handling.
 > Copy if_re.c and if_rlreg.h from HEAD to RELENG_7 and apply
 > attached one. If you still see watchdog timeouts, please turn off
 > TSO and let me know how it goes.
 > One user reported TSO issues on 8169 family controllers but I
 > can't reproduce this on my 8169 hardware so it could be related
 > with silicon bug of sepecific revision of the hardware.

Forgot to attach patch.
Here we go.

 > 
 >  >  > 
 >  >  > > Plugging/unplugging UTP cable to ethernet controller during boot
 >  >  > > change the long delay? How about disabling WOL before system
 >  >  > > shutdown?(e.g. ifconfig re0 -wol)
 >  >  > 
 >  >  > Plugging/unplugging the cable doesn't seem to make much difference, and
 >  >  > neither does disabling WOL before shutdown (or altogether)...
 >  >  > 
 >  > 
 >  > Ok.
 >  > 
 >  > Thanks for reporting.

-- 
Regards,
Pyun YongHyeon

Received on Sun Jul 13 2008 - 23:38:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:32 UTC