atheros driver under high load, panics and even more freezes

From: Daniel Dvořák <dandee_at_hellteam.net>
Date: Sat, 2 Sep 2006 03:23:27 +0200
Hi all,
 
first of all, I´m sorry maybe for my bad English.
 
We have 2 routers which I maintain in our mesh wireless community network.
 
The Router 1 has 2 atheros adapters, ath0=wistron cm9, ath1=wistron cm10, of
course some sisX, fxpX and so on.
The Router 2 has 1 atheros adapter, ath1=wistron CM10.
 
My R1 panics and even more it freezes very often. Maybe the reason for
panicing and freezing is the same and maybe not.
 
I started  (only after vmcore.5, so vmcore.6 is with this option)  to use
"option SW_WATCHDOG" in both my custom kernels on the R1 and R2 recently in
hope, it is some walkaround for freezing at least if not for panicing. 
 
This router was installed on the 1st of April 2006.
 
Statistics:
 
9 panics with 8 kernel dumps, 1 missed
 
10 freezes
 
I think that all panics some how connected to athX taskq process, page fault
in kernel panic and sbflush_locked.
 
I guess that panic comes when router transmits and receives datas at the
maximum throughput for setted nominal media rate speed, exactly 24Mbps, more
I do not use, because there are problems with quagga 
 
ospfd packets, it is known issue.
 
Today I did a small test with throughput.
 
Router 1 executed this command:
 
# ping -i 0.001 -c 100000 -s 1472 ANY IP
 
As you see, it is not even flood ping, it is almost flood, but not flood.
 
Throughput was about 1,13-1,2 MB/s as bmon showed me. I notice there is not
any qos and icmp.limit is so high net.inet.icmp.icmplim: 2147483647
net.ineticmp.icmplim_output: 0.

 
First 5 s latency was about 1,1-1,7 ms
After it goes to 10-30, 50-70, 110-130, 270-300, up 300ms and packet loss
 
.... some seconds ....
 
panic
 
 
here it is:
 
# kgdb kernel.debug /var/crash/vmcore.6
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so:
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
 
Unread portion of the kernel message buffer:
 

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc05a47bb
stack pointer           = 0x28:0xd447db18
frame pointer           = 0x28:0xd447db3c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 24 (ath1 taskq)
trap number             = 12
panic: page fault
Uptime: 1d7h56m6s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335
319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15
 
#0  doadump () at pcpu.h:165
165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:165
#1  0xc056da25 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:402
#2  0xc056dcbc in panic (fmt=0xc07a4185 "%s") at
/usr/src/sys/kern/kern_shutdown.c:558
#3  0xc07658c8 in trap_fatal (frame=0xd447dad8, eva=12) at
/usr/src/sys/i386/i386/trap.c:836
#4  0xc076562f in trap_pfault (frame=0xd447dad8, usermode=0, eva=12) at
/usr/src/sys/i386/i386/trap.c:744
#5  0xc076528d in trap (frame=
      {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -979275436, tf_esi = 370,
tf_ebp = -733488324, tf_isp = -733488380, tf_ebx = -979275520, tf_edx = 0,
tf_ecx = -1012252656, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip =
-1067825221, tf_cs = 32, tf_eflags = 590342, tf_esp = 0, tf_ss =
-733488320})
    at /usr/src/sys/i386/i386/trap.c:434
#6  0xc0754a6a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc05a47bb in m_copym (m=0x0, off0=1500, len=1480, wait=1) at
/usr/src/sys/kern/uipc_mbuf.c:400
#8  0xc06204b8 in ip_fragment (ip=0xc3aa4010, m_frag=0xd447dbec,
mtu=-979275520, if_hwassist_flags=0, sw_csum=1) at
/usr/src/sys/netinet/ip_output.c:975
#9  0xc0610c4e in ip_fastforward (m=0xc3a7ba00) at
/usr/src/sys/netinet/ip_fastfwd.c:561
#10 0xc05de953 in ether_demux (ifp=0xc33ac400, m=0xc3a7ba00) at
/usr/src/sys/net/if_ethersubr.c:766
#11 0xc05de715 in ether_input (ifp=0xc33ac400, m=0xc3a7ba00) at
/usr/src/sys/net/if_ethersubr.c:620
#12 0xc05f5604 in ieee80211_deliver_data (ic=0xc33ad230, ni=0xc5602000,
m=0xc3a7ba00) at /usr/src/sys/net80211/ieee80211_input.c:717
#13 0xc05f507d in ieee80211_input (ic=0xc33ad230, m=0xc3a7ba00,
ni=0xc5602000, rssi=30, rstamp=27616) at
/usr/src/sys/net80211/ieee80211_input.c:481
#14 0xc04afd66 in ath_rx_proc (arg=0xc33ad000, npending=1) at
/usr/src/sys/dev/ath/if_ath.c:2977
#15 0xc058de89 in taskqueue_run (queue=0xc32f4780) at
/usr/src/sys/kern/subr_taskqueue.c:217
#16 0xc058e03a in taskqueue_thread_loop (arg=0x0) at
/usr/src/sys/kern/subr_taskqueue.c:276
#17 0xc0558068 in fork_exit (callout=0xc058dff8 <taskqueue_thread_loop>,
arg=0xc33adee0, frame=0xd447dd38) at /usr/src/sys/kern/kern_fork.c:805
#18 0xc0754acc in fork_trampoline () at
/usr/src/sys/i386/i386/exception.s:208
(kgdb) quit
 
 
I notice packets were send out through ath0, so I expect ath0 taskq, through
ath1 I was logged in the box.
 
I can reproduce it with ping command, with high load.
 
 
There are earlier kernel dumps which is connected to high load some how, but
kgdb does not work with them.
 
# kgdb kernel.debug /var/crash/vmcore.0
kgdb: kvm_read: invalid address (0x18)
kgdb: kvm_read: invalid address (0x18)
kgdb: kvm_read: invalid address (0x18)
^C
 
# kgdb kernel.debug /var/crash/vmcore.1
kgdb: cannot read PTD
# kgdb kernel.debug /var/crash/vmcore.1
kgdb: cannot read PTD
# kgdb kernel.debug /var/crash/vmcore.1
kgdb: cannot read PTD
# kgdb kernel.debug /var/crash/vmcore.2
kgdb: cannot read PTD
# kgdb kernel.debug /var/crash/vmcore.3
kgdb: cannot read PTD
# kgdb kernel.debug /var/crash/vmcore.4
kgdb: cannot read PTD
# kgdb kernel.debug /var/crash/vmcore.5
kgdb: cannot read PTD
 
I have info´s files, here they are in attachments.
 
 
I guess it is about mem_buf, ath.c, ath taskq, something about memory, when
there is HIGH LOAD.
 
ANY HELP IS VERY APPRECIATED.
 
Daniel
 
P.S.: I am not currently subscribed in the freebsd-stable mailling list, so
use my e-mail address. I am ok with freebsd-current mailling list.

Received on Fri Sep 01 2006 - 23:23:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:59 UTC