Re: suspect bug in vge(4)

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Mon, 15 Jun 2009 10:55:05 +0900
On Fri, Jun 12, 2009 at 10:30:31PM +0200, Thomas Lotterer wrote:
> Pyun YongHyeon wrote:
> >On Thu, Jun 11, 2009 at 05:39:03PM +0200, Thomas Lotterer wrote:
> >>Pyun YongHyeon wrote:
> >>>Could you show me dmesg output(only vge(4) related one)? 
> >>>
> >># dmesg | grep vge
> >>vge0: <VIA Networking Gigabit Ethernet> port 0xec00-0xecff mem 
> >>0xdf7ff000-0xdf7ff0ff irq 28 at device 0.0 on pci2
> >>vge0: MSIX count : 0
> >>vge0: MSI count : 1
> >
> >I wonder why "Using 1 MSI messages" message is missing.
> >
> Never seen that message. Maybe more verbose/debug needed?
> 
> OK, next round. Here are today's findings. I switched from statically 
> linked to dynamically loaded drivers to accelerate the build+test 
> process. Finally, the results with both vge(4) drivers dynamically 
> loaded and statically linked were the same.
> 
> The good news is that the "yongari" driver actually works in one of 
> three or four cases. The situation with the driver when auto detecting 
> GigE is as already described:
> 
> >># ifconfig vge0
> >>vge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> >>options=389b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
> >>        ether 00:40:63:xx:xx:xx
> >>        inet [...]
> >>        media: Ethernet autoselect (1000baseT <full-duplex,flag0,flag1>)
> >>        status: active
> >>
> >>Unfortunately, no traffic could be sent and tcpdump(1) does not show any 
> >>incoming packages either, not even broadcasts.
> 
> However, sometimes the driver (incorrectly) auto selects 100BaseTX
> 
>         media: Ethernet autoselect (100baseTX <full-duplex,flag0,flag1>)
> 
> in which case it works well. I was able to copy 1500MB of data from the 
> server and back in three parallel running CIFS connections. The 
> "original problem" driver always broke upload before 100MB barrier.
> 
> >>Interesting side effect is that after that test the kernel with my 
> >>previous "original problem" vge(4) driver rebooted when initializing the 
> >>network card. No logs at this stage, sorry. Reboot did not help. Hard 
> >>reset did not help. Power cycle did help. Behavior was reproducible on a 
> >>second attempt.
> 
> My experience after countless reboots is that both drivers always show 
> this problem after the "yongari" driver was loaded previously. However, 
> enabling "boot from VIA Ethernet" in BIOS has been found to be a better 
> and more reliable workaround than power cycling. Not that I want to boot 
> from the network, it just seems the BIOS is resetting the NIC properly.
> 
> Also I was able to capture the error log from the screen:
> 
> vge0: <VIA Networking Gigabit Ethernet> port 0xec00-0xecff mem 
> 0xdf7ff000-0xdf7ff0ff irg 11 at device 0.0 on pci2
> vge0: MII read timed out
> vge0: failed to start MII autopoll
> vge0: MII without any phy!

This is message from stock vge(4). It indicates driver failed to
disable a autopolling feature of MII. MII autopolling can be used
to detect link state changes so stock vge(4) turned the feature on.
Correct link state tracking is very important to know when it lost
link, which link was established etc. The problem of MII
autopolling is driver should disable autopolling feature whenever
it want to access one of MII registers. So vge(4) used to disable
autopolling before accessing MII registers and reenabled
autopolling after the register access. To drive auto-negotiation
timer and link lost/establishment mii(4) requires periodic access
of MII registers so enabling/disabling time-consuming MII
autpolling was one of big issue to me.

In my patched vge(4), I completely removed that autopolling feature
and implemented link state tracking with mii(4). Maybe this could
be one of root cause why you can't establish giga link. The VIA
datasheet is not clear about MII autopolling so I need more
experimentation on real hardware.

> panic: Assertion mtx_unowned(m) failed at /usr/src/sys/kern/kern_mutex.c:827

This looks locking bug in driver. Show me backtrace info.

> Uptime: 1s
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting ...
> 
> -- 
> http://thomas.lotterer.net
Received on Sun Jun 14 2009 - 23:51:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:49 UTC