Re: Frequent network access freeze (in 7.0)

From: Unga <unga888_at_yahoo.com>
Date: Tue, 26 Feb 2008 02:06:54 -0800 (PST)
--- Robert Watson <rwatson_at_FreeBSD.org> wrote:

> 
> On Wed, 20 Feb 2008, Unga wrote:
> 
> > I'm running 7.0-PRERELEASE (RC2, dated
> 15/02/2008), compiled from sources on 
> > i386 machine (512MB RAM, 3.0GHz, tx0: <SMC
> EtherPower II 10/100>).
> >
> > Network access freezes very frequently. Cannot
> ping to any ip address. The 
> > only way to get networking working again is
> reboot.
> >
> > I'm having this problem on 7.0 ever since I tried
> it from BETA4. I have 
> > reported also to this list before but sadly nobody
> was interested on it.
> >
> > If somebody is interested to look into this
> problem, I could furnish with 
> > more detail and participate in testing.
> 
> This sort of problem frequently turns out to be a
> bug in a device driver or a 
> problem with interrupt probing/configuration, so my
> first guess would be a 
> problem with the if_tx driver.  The usual starting
> diagnostics when ping fails 
> are to try to use tcpdump to determine whether it's
> receive or transmit 
> failing (or both).  Quiet the network between two
> endpoints as much as you can 
> so you can avoid noise from making the dumps more
> complex, and dump arp and 
> icmp at both endpoints.  Now try to ping from each
> end point to the other. 
> One potential source of confusion is that ping
> requires ARP to work, and ARP 
> can be a slightly confusing protocol as it usually
> resolves actively (query, 
> response) but sometimes it receives passive updates
> or extends existing 
> entries.
> 
> What you want to look for is a packet sent by one
> side that isn't received by 
> the other.  You might find, for example, that your
> host receives packets fine, 
> but the packets it transmits are never received.
> This would be indicative of a 
> driver bug in which it fails to properly handle (for
> example) transmit queues 
> filling, and might only trigger under very high
> load.  Or, you might find that 
> your host never receives anything the other side
> transmits, but can send fine. 
> This might be indicative of a driver bug involving
> the receive code, or a 
> problem with how interrupts are being handled more
> generally.
> 
> It looks like the last non-routine maintenance to
> the driver was done by 
> Maxime in about 2003; the more recent changes have
> all been updates to 
> newbus/busdma infrastructure, ifnet changes, locking
> changes, etc.  I've CC'd 
> him as it sounds like he may have hardware...  My
> advice would be to do the 
> above tests and see if you can narrow down whether
> it's transmit, receive, or 
> both failing.
> 

Here are the detail when net access is working and
when not working:

When net access working
-----------------------

$ ifconfig
tx0:
flags=108843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 00:e1:20:34:bb:36
        inet 192.168.1.20 netmask 0xffffff00 broadcast
192.168.1.255
        media: Ethernet autoselect (10baseT/UTP)
        status: active
plip0:
flags=108810<POINTOPOINT,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric
0 mtu 16384
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000


$ netstat -r
Routing tables

Internet:
Destination        Gateway            Flags    Refs   
  Use  Netif Expire
default            192.168.1.1        UGS         0   
 1090    tx0
localhost          localhost          UH          0   
  186    lo0
192.168.1.0        link#1             UC          0   
    0    tx0
192.168.1.1        00:91:d2:4c:54:f8  UHLW        2   
    0    tx0    892

Internet6:
Destination        Gateway            Flags      Netif
Expire
localhost          localhost          UHL         lo0
fe80::%lo0         fe80::1%lo0        U           lo0
fe80::1%lo0        link#3             UHL         lo0
ff01:3::           fe80::1%lo0        UC          lo0
ff02::%lo0         fe80::1%lo0        UC          lo0


When net access NOT working
---------------------------

$ ifconfig
tx0:
flags=108843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 00:e1:20:34:bb:36
        inet 192.168.1.20 netmask 0xffffff00 broadcast
192.168.1.255
        media: Ethernet autoselect (10baseT/UTP)
        status: active
plip0:
flags=108810<POINTOPOINT,SIMPLEX,MULTICAST,NEEDSGIANT>
metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric
0 mtu 16384
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000


$ netstat -r
Routing tables

Internet:
Destination        Gateway            Flags    Refs   
  Use  Netif Expire
default            192.168.1.1        UGS         0   
 3338    tx0
localhost          localhost          UH          0   
  204    lo0
192.168.1.0        link#1             UC          0   
    0    tx0
192.168.1.1        00:91:d2:4c:54:f8  UHLW        2   
   28    tx0    997
192.168.1.2        link#1             UHLW        1   
    1    tx0

Internet6:
Destination        Gateway            Flags      Netif
Expire
localhost          localhost          UHL         lo0
fe80::%lo0         fe80::1%lo0        U           lo0
fe80::1%lo0        link#3             UHL         lo0
ff01:3::           fe80::1%lo0        UC          lo0
ff02::%lo0         fe80::1%lo0        UC          lo0


tcpdump -i tx0 -v

NOTE: When ping to 192.168.1.1, no tcpdump output.


ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
^C
--- 192.168.1.1 ping statistics ---
58 packets transmitted, 0 packets received, 100.0%
packet loss


/var/log/messages:
Feb 26 15:26:14 blacktower kernel: tx0: ERROR! Can't
stop Rx DMA
Feb 26 15:26:14 blacktower kernel: tx0: promiscuous
mode enabled

Note: These two messages keep on repeat on
/var/log/messages.

/var/log/messages at the time of send this email:
Feb 26 17:32:17 blacktower kernel: tx0: link state
changed to DOWN
Feb 26 17:36:25 blacktower kernel: tx0: link state
changed to UP
Feb 26 17:36:30 blacktower kernel: tx0: link state
changed to DOWN
Feb 26 17:37:07 blacktower kernel: tx0: link state
changed to UP
Feb 26 17:37:14 blacktower kernel: tx0: link state
changed to DOWN
Feb 26 17:37:22 blacktower kernel: tx0: link state
changed to UP

Note: This link state UP/DOWN behaviour noted only now
and does not show in my logs.

When reboot, net access start working again.

Please let me know what other information is required.

Kind regards
Unga


      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping
Received on Tue Feb 26 2008 - 10:55:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC