Re: [follow-up] FreeBSD/amd64 r195146 to r195848, fatal trap 12 under network load

From: Kamigishi Rei <spambox_at_haruhiism.net>
Date: Fri, 31 Jul 2009 05:40:49 +0400
Kamigishi Rei wrote:
> Revisions mentioned are those which were tested by me; r195849+ has 
> the corruption padded somewhere else so it might produce a panic with 
> a different set of options. For reference, my test kernel uses a 
> GENERIC config from May 09 snapshot without WITNESS and with 
> IPFIREWALL, IPFIREWALL_DEFAULT_TO_ACCEPT and DEVICE_POLLING enabled.
r195981 (latest checkout) traps with the *GENERIC* kernel (with WITNESS 
enabled). Same backtrace, same cause, and UP systems are not affected again.
Apparently, my diagnostics patch from the previous message seems to pad 
the corruption somewhere, so I can't use it to check lo_witness or other 
fields of nws_mtx at the time when mtx_lock gets corrupted.

Trap can be triggered with "ping -f -s 65507 localhost", iperf (just 
"iperf -c localhost" works for me), or by generating some high-speed 
network throughput (even a mysql query over localhost will do as we have 
a race here). Running ping will mostly trigger the trap inside 
swi_net(); iperf - inside netisr_queue_internal().

I will be grateful if someone could provide me some information on how 
to further debug it. Currently, I suspect that there's something about 
handling modspace (incorrect dereference somewhere, or something like that).

Crash info:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x4c89d38
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8056ffca
stack pointer           = 0x28:0xffffff800003eae0
frame pointer           = 0x28:0xffffff800003eb10
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi1: netisr 0)
Physical memory: 998 MB
Dumping 1137 MB: 1122 1106 1090 1074 1058 1042 1026 1010 994 978 962 946 
930 914 898 882 866 850 834 818 802 786 770 754 738 722 706 690 674 658 
642 626 610 594 578 562 546 530 514 498 482 466 450 434 418 402 386 370 
354 338 322 306 290 274 258 242 226 210 194 178 162 146 130 114 98 82 66 
50 34 18 2

Reading symbols from /boot/kernel/ahci.ko...Reading symbols from 
/boot/kernel/ahci.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ahci.ko
#0  doadump () at pcpu.h:223
223     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump () at pcpu.h:223
#1  0xffffffff801d8a9c in db_fncall (dummy1=Variable "dummy1" is not 
available.
)
    at /usr/src/sys/ddb/db_command.c:548
#2  0xffffffff801d8dd1 in db_command (last_cmdp=0xffffffff80be2720, 
cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:445
#3  0xffffffff801d9020 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:498
#4  0xffffffff801daff9 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff805adf65 in kdb_trap (type=12, code=0, tf=0xffffff800003ea30)
    at /usr/src/sys/kern/subr_kdb.c:534
#6  0xffffffff8085e7bd in trap_fatal (frame=0xffffff800003ea30, 
eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:847
#7  0xffffffff8085eb2d in trap_pfault (frame=0xffffff800003ea30, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:768
#8  0xffffffff8085f523 in trap (frame=0xffffff800003ea30)
    at /usr/src/sys/amd64/amd64/trap.c:494
#9  0xffffffff80844fe3 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:224
#10 0xffffffff8056ffca in _mtx_lock_sleep (m=0xffffffff81006824,
    tid=18446742974233875344, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_mutex.c:369
#11 0xffffffff805701b1 in _mtx_lock_flags (m=0xffffffff81006824, opts=0,
    file=0xffffffff8096c255 "/usr/src/sys/net/netisr.c", line=723)
    at /usr/src/sys/kern/kern_mutex.c:203
#12 0xffffffff8063411c in swi_net (arg=Variable "arg" is not available.
) at /usr/src/sys/net/netisr.c:723

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x45b4288
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8056ffca
stack pointer           = 0x28:0xffffff800003eae0
frame pointer           = 0x28:0xffffff800003eb10
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi1: netisr 0)
Physical memory: 998 MB
Dumping 1233 MB: 1218 1202 1186 1170 1154 1138 1122 1106 1090 1074 1058 
1042 1026 1010 994 978 962 946 930 914 898 882 866 850 834 818 802 786 
770 754 738 722 706 690 674 658 642 626 610 594 578 562 546 530 514 498 
482 466 450 434 418 402 386 370 354 338 322 306 290 274 258 242 226 210 
194 178 162 146 130 114 98 82 66 50 34 18 2

Reading symbols from /boot/kernel/ahci.ko...Reading symbols from 
/boot/kernel/ahci.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ahci.ko
#0  doadump () at pcpu.h:223
223     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump () at pcpu.h:223
#1  0xffffffff801d8a9c in db_fncall (dummy1=Variable "dummy1" is not 
available.
)
    at /usr/src/sys/ddb/db_command.c:548
#2  0xffffffff801d8dd1 in db_command (last_cmdp=0xffffffff80be2720, 
cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:445
#3  0xffffffff801d9020 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:498
#4  0xffffffff801daff9 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff805adf65 in kdb_trap (type=12, code=0, tf=0xffffff800003ea30)
    at /usr/src/sys/kern/subr_kdb.c:534
#6  0xffffffff8085e7bd in trap_fatal (frame=0xffffff800003ea30, 
eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:847
#7  0xffffffff8085eb2d in trap_pfault (frame=0xffffff800003ea30, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:768
#8  0xffffffff8085f523 in trap (frame=0xffffff800003ea30)
    at /usr/src/sys/amd64/amd64/trap.c:494
#9  0xffffffff80844fe3 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:224
#10 0xffffffff8056ffca in _mtx_lock_sleep (m=0xffffffff81006824,
    tid=18446742974233875344, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_mutex.c:369
#11 0xffffffff805701b1 in _mtx_lock_flags (m=0xffffffff81006824, opts=0,
    file=0xffffffff8096c255 "/usr/src/sys/net/netisr.c", line=753)
    at /usr/src/sys/kern/kern_mutex.c:203
#12 0xffffffff80633fc2 in swi_net (arg=Variable "arg" is not available.
) at /usr/src/sys/net/netisr.c:753

These two are from ping -f.
And this one is from iperf:

(kgdb) #0  doadump () at pcpu.h:223
#1  0xffffffff801d8a9c in db_fncall (dummy1=Variable "dummy1" is not 
available.
)
    at /usr/src/sys/ddb/db_command.c:548
#2  0xffffffff801d8dd1 in db_command (last_cmdp=0xffffffff80be2720, 
cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:445
#3  0xffffffff801d9020 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:498
#4  0xffffffff801daff9 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff805adf65 in kdb_trap (type=12, code=0, tf=0xffffff80238764d0)
    at /usr/src/sys/kern/subr_kdb.c:534
#6  0xffffffff8085e7bd in trap_fatal (frame=0xffffff80238764d0, 
eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:847
#7  0xffffffff8085f48c in trap (frame=0xffffff80238764d0)
    at /usr/src/sys/amd64/amd64/trap.c:345
#8  0xffffffff80844fe3 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:224
#9  0xffffffff8056ffca in _mtx_lock_sleep (m=0xffffffff81006824,
    tid=18446742974277752608, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_mutex.c:369
#10 0xffffffff805701b1 in _mtx_lock_flags (m=0xffffffff81006824, opts=0,
    file=0xffffffff8096c255 "/usr/src/sys/net/netisr.c", line=830)
    at /usr/src/sys/kern/kern_mutex.c:203
#11 0xffffffff806344a5 in netisr_queue_internal (proto=1,
    m=0xffffff0004fa6400, cpuid=Variable "cpuid" is not available.
) at /usr/src/sys/net/netisr.c:830
#12 0xffffffff80634589 in netisr_queue_src (proto=1, source=Variable 
"source" is not available.
)
    at /usr/src/sys/net/netisr.c:860

-- 
Kamigishi Rei
KREI-RIPE
Received on Thu Jul 30 2009 - 23:40:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:53 UTC