Re: Frequent network access freeze (in 7.0)

From: Unga <unga888_at_yahoo.com>
Date: Tue, 26 Feb 2008 01:57:53 -0800 (PST)
--- Robert Watson <rwatson_at_FreeBSD.org> wrote:

> 
> On Wed, 20 Feb 2008, Unga wrote:
> 
> > I'm running 7.0-PRERELEASE (RC2, dated
> 15/02/2008), compiled from sources on 
> > i386 machine (512MB RAM, 3.0GHz, tx0: <SMC
> EtherPower II 10/100>).
> >
> > Network access freezes very frequently. Cannot
> ping to any ip address. The 
> > only way to get networking working again is
> reboot.
> >
> > I'm having this problem on 7.0 ever since I tried
> it from BETA4. I have 
> > reported also to this list before but sadly nobody
> was interested on it.
> >
> > If somebody is interested to look into this
> problem, I could furnish with 
> > more detail and participate in testing.
> 
> This sort of problem frequently turns out to be a
> bug in a device driver or a 
> problem with interrupt probing/configuration, so my
> first guess would be a 
> problem with the if_tx driver.  The usual starting
> diagnostics when ping fails 
> are to try to use tcpdump to determine whether it's
> receive or transmit 
> failing (or both).  Quiet the network between two
> endpoints as much as you can 
> so you can avoid noise from making the dumps more
> complex, and dump arp and 
> icmp at both endpoints.  Now try to ping from each
> end point to the other. 
> One potential source of confusion is that ping
> requires ARP to work, and ARP 
> can be a slightly confusing protocol as it usually
> resolves actively (query, 
> response) but sometimes it receives passive updates
> or extends existing 
> entries.
> 
> What you want to look for is a packet sent by one
> side that isn't received by 
> the other.  You might find, for example, that your
> host receives packets fine, 
> but the packets it transmits are never received.
> This would be indicative of a 
> driver bug in which it fails to properly handle (for
> example) transmit queues 
> filling, and might only trigger under very high
> load.  Or, you might find that 
> your host never receives anything the other side
> transmits, but can send fine. 
> This might be indicative of a driver bug involving
> the receive code, or a 
> problem with how interrupts are being handled more
> generally.
> 
> It looks like the last non-routine maintenance to
> the driver was done by 
> Maxime in about 2003; the more recent changes have
> all been updates to 
> newbus/busdma infrastructure, ifnet changes, locking
> changes, etc.  I've CC'd 
> him as it sounds like he may have hardware...  My
> advice would be to do the 
> above tests and see if you can narrow down whether
> it's transmit, receive, or 
> both failing.
> 

Here are the detail when net access is working and
when not working:

When net access working
-----------------------

$ ifconfig
tx0:
flags=108843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 00:e1:20:34:bb:36
        inet 192.168.1.20 netmask 0xffffff00 broadcast
192.168.1.255
        media: Ethernet autoselect (10baseT/UTP)
        status: active
plip0:
flags=108810<POINTOPOINT,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric
0 mtu 16384
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000


$ netstat -r
Routing tables

Internet:
Destination        Gateway            Flags    Refs   
  Use  Netif Expire
default            192.168.1.1        UGS         0   
 1090    tx0
localhost          localhost          UH          0   
  186    lo0
192.168.1.0        link#1             UC          0   
    0    tx0
192.168.1.1        00:91:d2:4c:54:f8  UHLW        2   
    0    tx0    892

Internet6:
Destination        Gateway            Flags      Netif
Expire
localhost          localhost          UHL         lo0
fe80::%lo0         fe80::1%lo0        U           lo0
fe80::1%lo0        link#3             UHL         lo0
ff01:3::           fe80::1%lo0        UC          lo0
ff02::%lo0         fe80::1%lo0        UC          lo0


When net access NOT working
---------------------------

$ ifconfig
tx0:
flags=108843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 00:e1:20:34:bb:36
        inet 192.168.1.20 netmask 0xffffff00 broadcast
192.168.1.255
        media: Ethernet autoselect (10baseT/UTP)
        status: active
plip0:
flags=108810<POINTOPOINT,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric
0 mtu 16384
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000


$ netstat -r
Routing tables

Internet:
Destination        Gateway            Flags    Refs   
  Use  Netif Expire
default            192.168.1.1        UGS         0   
 3338    tx0
localhost          localhost          UH          0   
  204    lo0
192.168.1.0        link#1             UC          0   
    0    tx0
192.168.1.1        00:91:d2:4c:54:f8  UHLW        2   
   28    tx0    997
192.168.1.2        link#1             UHLW        1   
    1    tx0

Internet6:
Destination        Gateway            Flags      Netif
Expire
localhost          localhost          UHL         lo0
fe80::%lo0         fe80::1%lo0        U           lo0
fe80::1%lo0        link#3             UHL         lo0
ff01:3::           fe80::1%lo0        UC          lo0
ff02::%lo0         fe80::1%lo0        UC          lo0


tcpdump -i tx0 -v

NOTE: When ping to 192.168.1.1, no tcpdump output.


ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
^C
--- 192.168.1.1 ping statistics ---
58 packets transmitted, 0 packets received, 100.0%
packet loss


/var/log/messages:
Feb 26 15:26:14 blacktower kernel: tx0: ERROR! Can't
stop Rx DMA
Feb 26 15:26:14 blacktower kernel: tx0: promiscuous
mode enabled

Note: These two messages keep on repeat on
/var/log/messages.

/var/log/messages at the time of send this email:
Feb 26 17:32:17 blacktower kernel: tx0: link state
changed to DOWN
Feb 26 17:36:25 blacktower kernel: tx0: link state
changed to UP
Feb 26 17:36:30 blacktower kernel: tx0: link state
changed to DOWN
Feb 26 17:37:07 blacktower kernel: tx0: link state
changed to UP
Feb 26 17:37:14 blacktower kernel: tx0: link state
changed to DOWN
Feb 26 17:37:22 blacktower kernel: tx0: link state
changed to UP


When reboot, net access start working again.

Please let me know what other information is required.

Kind regards
Unga


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 
Received on Tue Feb 26 2008 - 10:52:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC