Re: LOR on RELENG_6, beta 5

From: Mike Tancsa <mike_at_sentex.net>
Date: Wed, 21 Sep 2005 13:45:17 -0400
At 01:15 PM 21/09/2005, Mike Tancsa wrote:

>lock order reversal
>  1st 0xc295a090 inp (divinp) _at_ /usr/src/sys/netinet/ip_divert.c:327
>  2nd 0xc28d9150 ipsec request (ipsec request) _at_ 
> /usr/src/sys/netipsec/ipsec_output.c:354
>KDB: stack backtrace:
>kdb_backtrace(0,ffffffff,c0763e30,c0763e58,c072b204) at kdb_backtrace+0x29
>witness_checkorder(c28d9150,9,c06fc8ea,162) at witness_checkorder+0x564
>_mtx_lock_flags(c28d9150,0,c06fc8ea,162,0) at _mtx_lock_flags+0x5b
>ipsec4_process_packet(c28f9800,c28d9100,22,0,c26c8e00) at 
>ipsec4_process_packet+0x45
>ip_output(c28f9800,0,e740fb28,22,0) at ip_output+0x74f
>div_output(c294b858,c28f9800,c23bd440,0,e740fc08) at div_output+0x185
>div_send(c294b858,0,c28f9800,c23bd440,0) at div_send+0x3f
>sosend(c294b858,c23bd440,e740fc3c,c28f9800,0) at sosend+0x5e3
>kern_sendit(c2711d80,3,e740fcbc,0,0) at kern_sendit+0x104
>sendit(c2711d80,3,e740fcbc,0,bfbdec04) at sendit+0x163
>sendto(c2711d80,e740fd04,6,0,296) at sendto+0x4d
>syscall(3b,3b,3b,2,64) at syscall+0x22f
>Xint0x80_syscall() at Xint0x80_syscall+0x1f
>--- syscall (133, FreeBSD ELF32, sendto), eip = 0x280c5d97, esp = 
>0xbfbdeb0c, ebp = 0xbfbeebb8 ---
>
>I am able to replicate this on the box without too much 
>effort.  Unfortunately, it doesnt want to dump on my twe device for 
>some reason despite doing a
>dumpon -v /dev/twed0s1b

OK, I figured out why no dumps. Its not actually core dumping, its 
getting into some sort of deadlock. The reboot was courtesy of the 
watchdog kicking in.

I recompiled the kernel so I can break to the serial console and 
disabled the watchdog so the box does not reboot.  On the serial 
console, I see the LOR

lock order reversal
  1st 0xc292a090 inp (divinp) _at_ /usr/src/sys/netinet/ip_divert.c:327
  2nd 0xc28bc950 ipsec request (ipsec request) _at_ 
/usr/src/sys/netipsec/ipsec_output.c:354
KDB: stack backtrace:
kdb_backtrace(0,ffffffff,c075ed70,c075ed98,c07261c4) at kdb_backtrace+0x29
witness_checkorder(c28bc950,9,c06f7aa6,162) at witness_checkorder+0x564
_mtx_lock_flags(c28bc950,0,c06f7aa6,162,0) at _mtx_lock_flags+0x5b
ipsec4_process_packet(c2a4a400,c28bc900,22,0,c28b7600) at 
ipsec4_process_packet+0x45
ip_output(c2a4a400,0,e741eb28,22,0) at ip_output+0x74f
div_output(c2926b20,c2a4a400,c255a280,0,e741ec08) at div_output+0x185
div_send(c2926b20,0,c2a4a400,c255a280,0) at div_send+0x3f
sosend(c2926b20,c255a280,e741ec3c,c2a4a400,0) at sosend+0x5e3
kern_sendit(c2731600,3,e741ecbc,0,0) at kern_sendit+0x104
sendit(c2731600,3,e741ecbc,0,bfbdec1c) at sendit+0x163
sendto(c2731600,e741ed04,6,2,296) at sendto+0x4d
syscall(3b,3b,3b,2,7c) at syscall+0x22f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (133, FreeBSD ELF32, sendto), eip = 0x280c5d97, esp = 
0xbfbdeb0c, ebp = 0xbfbeebb8 ---


And here the box is frozen up.


telnet> send break
KDB: enter: Line break on console
[thread pid 12 tid 100004 ]
Stopped at      kdb_enter+0x2b: nop
db>


db> show pcpu
cpuid        = 0
curthread    = 0xc22a9900: pid 12 "idle: cpu0"
curpcb       = 0xe3481d90
fpcurthread  = none
idlethread   = 0xc22a9900: pid 12 "idle: cpu0"
APIC ID      = 0
currentldt   = 0x50
spin locks held:
db> show alllocks
Process 682 (ssh) thread 0xc2aa6000 (100137)
exclusive sleep mutex crypto (crypto op queues) r = 0 (0xc07aa420) 
locked _at_ /usr/src/sys/opencrypto/crypto.c:669
exclusive sleep mutex ipsec request r = 1 (0xc28bc950) locked _at_ 
/usr/src/sys/netipsec/xform_esp.c:872
exclusive sleep mutex inp (tcpinp) r = 0 (0xc28c6a68) locked _at_ 
/usr/src/sys/netinet/tcp_usrreq.c:651
Process 586 (natd) thread 0xc2731600 (100080)
exclusive sleep mutex inp (divinp) r = 0 (0xc292a090) locked _at_ 
/usr/src/sys/netinet/ip_divert.c:327
exclusive sleep mutex div r = 0 (0xc07a0b6c) locked _at_ 
/usr/src/sys/netinet/ip_divert.c:325
Process 37 (swi4: clock sio) thread 0xc2317d80 (100036)
exclusive sleep mutex tcp r = 0 (0xc07a1ecc) locked _at_ 
/usr/src/sys/netinet/tcp_timer.c:457
db> ps
   pid   proc     uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
   682 c2aa4830    0   681   681 0004002 [LOCK    div c29f57c0] ssh
   681 c2aa4a3c    0   679   681 0004002 [SLPQ piperd 0xc272f000][SLP] scp
   679 c2894830    0   678   679 0004002 [SLPQ pause 0xc2894864][SLP] csh
   678 c2894624 1001   675   678 0004102 [SLPQ wait 0xc2894624][SLP] su
   675 c26efa3c 1001   674   675 0004002 [SLPQ pause 0xc26efa70][SLP] csh
   674 c2894a3c 1001   672   672 0000100 [SLPQ select 0xc079d3c4][SLP] sshd
   672 c2734830    0   523   672 0004100 [SLPQ sbwait 0xc2a2920c][SLP] sshd
   663 c29f320c    0   657   663 0004102 [SLPQ select 0xc079d3c4][SLP] openvpn
   657 c2894c48    0   656   657 0004002 [SLPQ pause 0xc2894c7c][SLP] csh
   656 c29f3000 1001   653   656 0004102 [SLPQ wait 0xc29f3000][SLP] su
   653 c29f3624 1001   652   653 0004002 [SLPQ pause 0xc29f3658][SLP] csh
   652 c26ec624 1001   650   650 0000100 [SLPQ select 0xc079d3c4][SLP] sshd
   650 c29f3a3c    0   523   650 0004100 [SLPQ sbwait 0xc2a2a20c][SLP] sshd
   649 c29f3c48    0     1   649 0004002 [SLPQ ttyin 0xc254d410][SLP] getty
   648 c29f7000    0     1   648 0004002 [SLPQ ttyin 0xc254f010][SLP] getty
   647 c29f720c    0     1   647 0004002 [SLPQ ttyin 0xc254c410][SLP] getty
   646 c29f7418    0     1   646 0004002 [SLPQ ttyin 0xc254fc10][SLP] getty
   645 c29f7624    0     1   645 0004002 [SLPQ ttyin 0xc2555010][SLP] getty
   644 c29f7830    0     1   644 0004002 [SLPQ ttyin 0xc254f810][SLP] getty
   643 c2734000    0     1   643 0004002 [SLPQ ttyin 0xc2555810][SLP] getty
   642 c2733000    0     1   642 0004002 [SLPQ ttyin 0xc2557010][SLP] getty
   641 c26efc48    0     1   641 0004002 [SLPQ ttyin 0xc2556010][SLP] getty
   595 c2730a3c    0     1    64 0000002 [SLPQ select 0xc079d3c4][SLP] 3dm2
   586 c2730624    0     1   586 0000000 [LOCK ipsec request c26bacc0] natd
   570 c2734624    0     1   570 0000101 [SLPQ select 0xc079d3c4][SLP] bgpd
   568 c289020c    0     1   568 0000101 [SLPQ select 0xc079d3c4][SLP] zebra
   545 c2734c48    0     1   545 0000000 [SLPQ nanslp 0xc07500ac][SLP] cron
   532 c2733830   25     1   532 0000100 [SLPQ pause 0xc2733864][SLP] sendmail
   528 c2730418    0     1   528 0000100 [SLPQ select 0xc079d3c4][SLP] sendmail
   523 c2733c48    0     1   523 0000100 [SLPQ select 0xc079d3c4][SLP] sshd
   521 c2890c48    0   505   505 0000000 [SLPQ pause 0xc2890c7c][SLP] ntpd
   505 c26eca3c    0     1   505 0000000 [SLPQ select 0xc079d3c4][SLP] ntpd
   407 c2890624   53     1   407 0000100 [SLPQ select 0xc079d3c4][SLP] named
   346 c2890000    0     1   346 0000000 [SLPQ select 0xc079d3c4][SLP] syslogd
   313 c26ec000    0     1   313 0000000 [SLPQ select 0xc079d3c4][SLP] devd
   191 c26ef000    0     1   191 0000000 [SLPQ pause 0xc26ef034][SLP] adjkerntz
    63 c238d20c    0     0     0 0000204 [SLPQ - 0xe4f4fd04][SLP] schedcpu
    62 c238d418    0     0     0 0000204 [SLPQ - 0xc07a502c][SLP] nfsiod 3
    61 c238d624    0     0     0 0000204 [SLPQ - 0xc07a5028][SLP] nfsiod 2
    60 c238d830    0     0     0 0000204 [SLPQ - 0xc07a5024][SLP] nfsiod 1
    59 c238da3c    0     0     0 0000204 [SLPQ - 0xc07a5020][SLP] nfsiod 0
    58 c238dc48    0     0     0 0000204 [SLPQ vlruwt 0xc238dc48][SLP] vnlru
    57 c23f4000    0     0     0 0000204 [SLPQ syncer 0xc074fe1c][SLP] syncer
    56 c23f420c    0     0     0 0000204 [SLPQ psleep 
0xc079d90c][SLP] bufdaemon
    55 c23f4418    0     0     0 000020c [SLPQ pgzero 0xc07ab684][SLP] pagezero
    54 c23f4624    0     0     0 0000204 [SLPQ psleep 0xc07ab1d4][SLP] vmdaemon
    53 c23f4830    0     0     0 0000204 [SLPQ psleep 
0xc07ab190][SLP] pagedaemon
    52 c23f4a3c    0     0     0 0000204 [IWAIT] swi0: sio
    51 c23f4c48    0     0     0 0000204 [SLPQ usbevt 0xc238b210][SLP] usb4
    50 c2316624    0     0     0 0000204 [SLPQ usbevt 0xc23d4210][SLP] usb3
    49 c2316830    0     0     0 0000204 [SLPQ usbevt 0xc23c0210][SLP] usb2
    48 c2316a3c    0     0     0 0000204 [SLPQ usbevt 0xc2394210][SLP] usb1
    47 c2316c48    0     0     0 0000204 [SLPQ usbtsk 0xc074ad64][SLP] usbtask
    46 c238c000    0     0     0 0000204 [SLPQ usbevt 0xc2398210][SLP] usb0
    45 c238c20c    0     0     0 0000204 [IWAIT] swi6: task queue
    44 c238c418    0     0     0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task2
    43 c238c624    0     0     0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task1
     9 c238c830    0     0     0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task0
     8 c238ca3c    0     0     0 0000204 [SLPQ - 0xc233aa00][SLP] kqueue taskq
    42 c238cc48    0     0     0 0000204 [IWAIT] swi2: cambio
    41 c238d000    0     0     0 0000204 [IWAIT] swi5:+
     7 c2309c48    0     0     0 0000204 [SLPQ - 0xc233ac80][SLP] thread taskq
    40 c2315000    0     0     0 0000204 [IWAIT] swi6:+
    39 c231520c    0     0     0 0000204 [SLPQ - 0xc074a5a0][SLP] yarrow
     6 c2315418    0     0     0 0000204 [SLPQ - 0xc074d5a8][SLP] g_down
     5 c2315624    0     0     0 0000204 [SLPQ - 0xc074d5a4][SLP] g_up
     4 c2315830    0     0     0 0000204 [SLPQ - 0xc074d59c][SLP] g_event
     3 c2315a3c    0     0     0 0000204 [SLPQ crypto_ret_wait 
0xc07aa444][SLP] crypto returns
     2 c2315c48    0     0     0 0000204 [SLPQ crypto_wait 
0xc07aa404][SLP] crypto
    38 c2316000    0     0     0 0000204 [IWAIT] swi3: vm
    37 c231620c    0     0     0 000020c [LOCK    inp c22f8dc0] swi4: clock sio
    36 c2316418    0     0     0 0000204 [LOCK    div c29f57c0] swi1: net
    35 c2300624    0     0     0 0000204 [IWAIT] irq23: uhci0 ehci0
    34 c2300830    0     0     0 0000204 [IWAIT] irq22: em0
    33 c2300a3c    0     0     0 0000204 [IWAIT] irq21: twe0
    32 c2300c48    0     0     0 0000204 [IWAIT] irq20: fxp0
    31 c2309000    0     0     0 0000204 [IWAIT] irq19: uhci1++
    30 c230920c    0     0     0 0000204 [IWAIT] irq18: uhci2
    29 c2309418    0     0     0 0000204 [IWAIT] irq17: fwohci0
    28 c2309624    0     0     0 0000204 [IWAIT] irq16: uhci3
    27 c2309830    0     0     0 0000204 [IWAIT] irq15: ata1
    26 c2309a3c    0     0     0 0000204 [IWAIT] irq14: ata0
    25 c22ad20c    0     0     0 0000204 [IWAIT] irq13:
    24 c22ad418    0     0     0 0000204 [IWAIT] irq12:
    23 c22ad624    0     0     0 0000204 [IWAIT] irq11:
    22 c22ad830    0     0     0 0000204 [IWAIT] irq10:
    21 c22ada3c    0     0     0 0000204 [IWAIT] irq9: acpi0
    20 c22adc48    0     0     0 0000204 [IWAIT] irq8:
    19 c2300000    0     0     0 0000204 [IWAIT] irq7: ppc0
    18 c230020c    0     0     0 0000204 [IWAIT] irq6:
    17 c2300418    0     0     0 0000204 [IWAIT] irq5:
    16 c22a8000    0     0     0 0000204 [IWAIT] irq4: sio0
    15 c22a820c    0     0     0 0000204 [IWAIT] irq3:
    14 c22a8418    0     0     0 0000204 [IWAIT] irq0:
    13 c22a8624    0     0     0 0000204 [IWAIT] irq1: atkbd0
    12 c22a8830    0     0     0 000020c [CPU 0] idle: cpu0
    11 c22a8a3c    0     0     0 000020c [CPU 1] idle: cpu1
     1 c22a8c48    0     0     1 0004200 [SLPQ wait 0xc22a8c48][SLP] init
    10 c22ad000    0     0     0 0000204 [SLPQ ktrace 0xc074dff8][SLP] ktrace
     0 c074d6a0    0     0     0 0000200 [IWAIT] swapper
db> show lockedvnods
Locked vnodes
db>  show lockedbufs
db> trace
Tracing pid 12 tid 100004 td 0xc22a9900
kdb_enter(c07056df) at kdb_enter+0x2b
siointr1(c254b000,c07ad5c0,0,c07054eb,56e) at siointr1+0xce
siointr(c254b000) at siointr+0x21
intr_execute_handlers(c229f490,e3481c94,4,e3481cd8,c068c0e3) at 
intr_execute_handlers+0xa5
lapic_handle_intr(34) at lapic_handle_intr+0x2e
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc08a18fd, esp = 0xe3481cd8, ebp = 0xe3481cd8 ---
acpi_cpu_c1(c074f780,1,e3481cf8,1,c22a8830) at acpi_cpu_c1+0x5
acpi_cpu_idle(e3481d0c,c0516a51,c05169f4,e3481d24,c0516834) at 
acpi_cpu_idle+0x13e
cpu_idle(c05169f4,e3481d24,c0516834,0,e3481d38) at cpu_idle+0x28
idle_proc(0,e3481d38,0,c05169f4,0) at idle_proc+0x5d
fork_exit(c05169f4,0,e3481d38) at fork_exit+0xa0
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe3481d6c, ebp = 0 ---
db> show lockedvnods
Locked vnodes
db>

># dumpon -v /dev/twed0s1b
>kernel dumps on /dev/twed0s1b
># pstat -T
>110/12328 files
>0M/2048M swap space
>
>         ---Mike
>
>--------------------------------------------------------------------
>Mike Tancsa,                                      tel +1 519 651 3400
>Sentex Communications,                            mike_at_sentex.net
>Providing Internet since 1994                    www.sentex.net
>Cambridge, Ontario Canada                         www.sentex.net/mike
Received on Wed Sep 21 2005 - 15:44:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:43 UTC