Re: FreeBSD 7.0 Beta, RC, RELEASE (amd64) freezes with dummynet enabled

From: matthew <matthew_at_matthew.sk>
Date: Fri, 07 Mar 2008 12:34:27 +0100
matthew wrote:

> Barney Cordoba wrote:
>
>> --- matthew <matthew_at_matthew.sk> wrote:
>>
>>  
>>> Kris Kennaway wrote:
>>>
>>>    
>>>> matthew wrote:
>>>>      
>>>>> I have posted before that i have a stability
>>>>>         
>>> issue with the 7.0 branch
>>>    
>>>>> on my servers. Tested on
>>>>>         
>>> BETA2,BETA4,RC1,RC2,RELEASE
>>>    
>>>>> The original thread and my post with details is
>>>>>         
>>> at:
>>>     
>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-12/msg00674.html 
>>
>>  
>>>>> I was waiting for the 7.0-RELEASE, updated the
>>>>>         
>>> whole servers, and
>>>    
>>>>> enabled dummynet again, but it always freezes
>>>>>         
>>> after some minutes, 100%
>>>    
>>>>> reproducible.
>>>>>
>>>>> I tested it also on a HP 140 G3 1U server, where
>>>>>         
>>> 6.3 has absolutely no
>>>    
>>>>> problems, but the 7.0 branch keeps freezing.
>>>>>
>>>>> Again, if it helps to solve this bug, i can
>>>>>         
>>> rebuild the kernel with
>>>    
>>>>> debug symbols a take some screenshots :)
>>>>>
>>>>> _______________________________________________
>>>>> freebsd-current_at_freebsd.org mailing list
>>>>>
>>>>>         
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>  
>>>>> To unsubscribe, send any mail to 
>>>>> "freebsd-current-unsubscribe_at_freebsd.org"
>>>>>
>>>>>
>>>>>         
>>>> Please follow the steps at
>>>>
>>>>
>>>>       
>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html 
>>
>>  
>>>> Kris
>>>> _______________________________________________
>>>> freebsd-current_at_freebsd.org mailing list
>>>>
>>>>       
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>  
>>>> To unsubscribe, send any mail to 
>>>> "freebsd-current-unsubscribe_at_freebsd.org"
>>>>       
>>> I have some screenshots from debug console after the
>>> freeze, i hade to replace the keyboard with a working ESC key to
>>> launch the ctrl+alt+esc:)
>>>
>>> You can find it on http://dummynetcrash.matthew.sk/
>>>
>>> Sorry for the bad quality of some pictures.
>>>
>>> I have also a dump, after running panic in the debug
>>> console:
>>>
>>> (gdb)
>>> root_at_hanka:/usr/src/sys/amd64/compile/HANKA-debug#
>>> kgdb kernel.debug /var/crash/vmcore.1
>>> [GDB will not be able to debug user-mode threads: 
>>> /usr/lib/libthread_db.so: Undefined symbol
>>> "ps_pglobal_lookup"]
>>> GNU gdb 6.1.1 [FreeBSD]
>>> Copyright 2004 Free Software Foundation, Inc.
>>> GDB is free software, covered by the GNU General
>>> Public License, and you are
>>> welcome to change it and/or distribute copies of it
>>> under certain conditions.
>>> Type "show copying" to see the conditions.
>>> There is absolutely no warranty for GDB.  Type "show
>>> warranty" for details.
>>> This GDB was configured as "amd64-marcel-freebsd".
>>>
>>> Unread portion of the kernel message buffer:
>>> KDB: enter: manual escape to debugger
>>> panic: from debugger
>>> cpuid = 0
>>> Uptime: 19h35m58s
>>> Physical memory: 993 MB
>>> Dumping 392 MB: 377 361 345 329 313 297 281 265 249
>>> 233 217 201 185 169 153 137 121 105 89 73 57 41 25 9
>>>
>>> #0  doadump () at pcpu.h:194
>>> 194             __asm __volatile("movq %%gs:0,%0" :
>>> "=r" (td));
>>> (kgdb) backtrace
>>> #0  doadump () at pcpu.h:194
>>> #1  0xffffffff80480f05 in boot (howto=260) at 
>>> ../../../kern/kern_shutdown.c:409
>>> #2  0xffffffff804813a7 in panic (fmt=Variable "fmt"
>>> is not available.
>>> ) at ../../../kern/kern_shutdown.c:563
>>> #3  0xffffffff801bad37 in db_panic (addr=Variable
>>> "addr" is not available.
>>> ) at ../../../ddb/db_command.c:433
>>> #4  0xffffffff801bb61c in db_command_loop () at 
>>> ../../../ddb/db_command.c:401
>>> #5  0xffffffff801bd07f in db_trap (type=Variable
>>> "type" is not available.
>>> ) at ../../../ddb/db_main.c:222
>>> #6  0xffffffff804a89c5 in kdb_trap (type=3, code=0, 
>>> tf=0xffffffff9ff2a9e0) at
>>> ../../../kern/subr_kdb.c:502
>>> #7  0xffffffff8073c4c5 in trap
>>> (frame=0xffffffff9ff2a9e0) at ../../../amd64/amd64/trap.c:499
>>> #8  0xffffffff80721dfe in calltrap () at 
>>> ../../../amd64/amd64/exception.S:169
>>> #9  0xffffffff804a8b91 in kdb_enter
>>> (msg=0xffffffff80e20fe0 "") at cpufunc.h:63
>>> #10 0xffffffff803ae691 in scgetc
>>> (sc=0xffffffff80b3c5a0, flags=Variable "flags" is not available.
>>> ) at ../../../dev/syscons/syscons.c:3378
>>> #11 0xffffffff803b17e4 in sckbdevent
>>> (thiskbd=0xffffff0001154a00, event=Variable "event" is not available.
>>> ) at ../../../dev/syscons/syscons.c:627
>>> #12 0xffffffff8031be23 in kbdmux_intr
>>> (kbd=0xffffff0001154a00, arg=Variable "arg" is not available.
>>> ) at ../../../dev/kbdmux/kbdmux.c:549
>>> #13 0xffffffff8031c3a0 in kbdmux_kbd_intr
>>> (xkbd=Variable "xkbd" is not available.
>>> ) at ../../../dev/kbdmux/kbdmux.c:200
>>> #14 0xffffffff804b2844 in taskqueue_run
>>> (queue=0xffffff000117a300) at ../../../kern/subr_taskqueue.c:255
>>> #15 0xffffffff80465c9a in ithread_loop
>>> (arg=0xffffff0001104180) at ../../../kern/kern_intr.c:1036
>>> #16 0xffffffff8046348a in fork_exit
>>> (callout=0xffffffff80465bc0 <ithread_loop>, arg=0xffffff0001104180,
>>> frame=0xffffffff9ff2ac80)
>>>     at ../../../kern/kern_fork.c:781
>>> #17 0xffffffff807221ce in fork_trampoline () at 
>>> ../../../amd64/amd64/exception.S:415
>>> #18 0x0000000000000000 in ?? ()
>>> #19 0x0000000000000000 in ?? ()
>>> #20 0x0000000000000001 in ?? ()
>>> #21 0x0000000000000000 in ?? ()
>>> #22 0x0000000000000000 in ?? ()
>>> #23 0x0000000000000000 in ?? ()
>>> #24 0x0000000000000000 in ?? ()
>>> #25 0x0000000000000000 in ?? ()
>>> #26 0x0000000000000000 in ?? ()
>>> #27 0x0000000000000000 in ?? ()
>>> #28 0x0000000000000000 in ?? ()
>>> #29 0x0000000000000000 in ?? ()
>>> #30 0x0000000000000000 in ?? ()
>>> #31 0x0000000000000000 in ?? ()
>>> #32 0x0000000000000000 in ?? ()
>>> #33 0x0000000000000000 in ?? ()
>>> #34 0x0000000000000000 in ?? ()
>>> #35 0x0000000000000000 in ?? ()
>>> #36 0x0000000000000000 in ?? ()
>>> #37 0x0000000000000000 in ?? ()
>>> #38 0x0000000000000000 in ?? ()
>>> #39 0x0000000000000000 in ?? ()
>>> #40 0x0000000000000000 in ?? ()
>>> #41 0x0000000000000000 in ?? ()
>>> #42 0x0000000000d7f000 in ?? ()
>>> #43 0xffffff00011c18d0 in ?? ()
>>> #44 0xffffffff80a846e0 in facility_initialized ()
>>> #45 0xffffff00011c18d0 in ?? ()
>>> #46 0xffffff0001084340 in ?? ()
>>> #47 0xffffffff9ff2ab70 in ?? ()
>>> #48 0xffffffff9ff2ab38 in ?? ()
>>> #49 0xffffff00011bc000 in ?? ()
>>> #50 0xffffffff8049e826 in sched_switch
>>> (td=0xffffff0001104180, newtd=0xffffffff80465bc0, flags=Variable 
>>> "flags" is
>>> not available.
>>> ) at ../../../kern/sched_4bsd.c:905
>>> Previous frame inner to this frame (corrupt stack?)
>>>
>>> On http://dummynetcrash.matthew.sk you can also find
>>> the kernel.debug and tha crash files, for more debugging.
>>>     
>>
>> Have you tried your setup without polling?  It really
>> doesn't make any sense to poll when using controllers
>> that have interrupt hold offs that can be precisely
>> programmed like the em controllers. But it will
>> certain give insight to your problem if one works and
>> the other doesn't.
>>
>> I'd also try it on a 32bit compile. Otherwise you have
>> too many variables.
>>
>> Barney
>>
>>
>>       
>> ____________________________________________________________________________________ 
>>
>> Looking for last minute shopping deals?  Find them fast with Yahoo! 
>> Search.  
>> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
>> _______________________________________________
>> freebsd-current_at_freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to 
>> "freebsd-current-unsubscribe_at_freebsd.org"
>>   
> I disabled the polling, for my suprise, the server didn`t crashed 
> after some minutes, but after 1 hour, but crushed, maybe only a 
> coincidence, but maybe not. The resukt is the same, it crashed. It 
> also crashed on the HP 140 G3 server with bge NIC without polling 
> enabled on the RC2 release.
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to 
> "freebsd-current-unsubscribe_at_freebsd.org"
I am bringing back to live this thread, please help me to resolve this  
bug, some clue, where to look, i think this is a critical bug which 
prevents to use 7.X on routers used for shaping.

I have disabled polling but still, the box freezes after some minutes 
under heavy traffic ~150-200Mbit/s.
I have tested the for network performance pool with iperf with load 
~150Mbit/s with 10 paralels streams, but the box didn`t freeze.

The box is an fileserver, so maybe acces to disk operations may be a clue.

I will also try it with ULE insteat of ULE, but i think in the past with 
BETA2 i had the ULE schneduler.

The dump, after typing panic on the debug konzole after freeze is awlays 
pointing to:

root_at_hanka:/usr/src/sys/amd64/compile/HANKA-debug# kgdb kernel.debug 
/var/crash/vmcore.2
[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
KDB: enter: manual escape to debugger
panic: from debugger
cpuid = 0
Uptime: 17m55s
Physical memory: 993 MB
Dumping 169 MB: 154 138 122 106 90 74 58 42 26 10

#0  doadump () at pcpu.h:194
194             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:194
#1  0xffffffff80480f05 in boot (howto=260) at 
../../../kern/kern_shutdown.c:409
#2  0xffffffff804813a7 in panic (fmt=Variable "fmt" is not available.
) at ../../../kern/kern_shutdown.c:563
#3  0xffffffff801bad37 in db_panic (addr=Variable "addr" is not available.
) at ../../../ddb/db_command.c:433
#4  0xffffffff801bb61c in db_command_loop () at 
../../../ddb/db_command.c:401
#5  0xffffffff801bd07f in db_trap (type=Variable "type" is not available.
) at ../../../ddb/db_main.c:222
#6  0xffffffff804a89c5 in kdb_trap (type=3, code=0, 
tf=0xffffffff9ff219e0) at ../../../kern/subr_kdb.c:502
#7  0xffffffff8073c535 in trap (frame=0xffffffff9ff219e0) at 
../../../amd64/amd64/trap.c:499
#8  0xffffffff80721e6e in calltrap () at 
../../../amd64/amd64/exception.S:169
#9  0xffffffff804a8b91 in kdb_enter (msg=0xffffffff80e20fe0 "") at 
cpufunc.h:63
#10 0xffffffff803ae691 in scgetc (sc=0xffffffff80b3c600, flags=Variable 
"flags" is not available.
) at ../../../dev/syscons/syscons.c:3378
#11 0xffffffff803b17e4 in sckbdevent (thiskbd=0xffffff0001154a00, 
event=Variable "event" is not available.
) at ../../../dev/syscons/syscons.c:627
#12 0xffffffff8031be23 in kbdmux_intr (kbd=0xffffff0001154a00, 
arg=Variable "arg" is not available.
) at ../../../dev/kbdmux/kbdmux.c:549
#13 0xffffffff8031c3a0 in kbdmux_kbd_intr (xkbd=Variable "xkbd" is not 
available.
) at ../../../dev/kbdmux/kbdmux.c:200
#14 0xffffffff804b2844 in taskqueue_run (queue=0xffffff000117a300) at 
../../../kern/subr_taskqueue.c:255
#15 0xffffffff80465c9a in ithread_loop (arg=0xffffff0001104180) at 
../../../kern/kern_intr.c:1036
#16 0xffffffff8046348a in fork_exit (callout=0xffffffff80465bc0 
<ithread_loop>, arg=0xffffff0001104180, frame=0xffffffff9ff21c80)
    at ../../../kern/kern_fork.c:781
#17 0xffffffff8072223e in fork_trampoline () at 
../../../amd64/amd64/exception.S:415
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000d7f000 in ?? ()
#43 0xffffff00011c18d0 in ?? ()
#44 0xffffffff80a84740 in facility_initialized ()
#45 0xffffff00011c18d0 in ?? ()
#46 0xffffff0001084340 in ?? ()
#47 0xffffffff9ff21b70 in ?? ()
#48 0xffffffff9ff21b38 in ?? ()
#49 0xffffff00011bc000 in ?? ()
#50 0xffffffff8049e826 in sched_switch (td=0xffffff0001104180, 
newtd=0xffffffff80465bc0, flags=Variable "flags" is not available.
) at ../../../kern/sched_4bsd.c:905
Previous frame inner to this frame (corrupt stack?)
Received on Fri Mar 07 2008 - 10:34:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC