Re: kern/68442: panic - acquiring duplicate lock of same type: "sleepq chain"

From: Daniel Lang <dl_at_leo.org>
Date: Thu, 1 Jul 2004 17:32:21 +0200
Hi,

John Baldwin wrote on Tue, Jun 29, 2004 at 02:53:34PM -0400:
[..]
> Well, it may be a hint sadly enough.  The fact that it thinks Giant is a spin 
> lock means that witness is confused, and this panic may be further such 
> confusion.  One possibility is that somehow the sleep queue chain mutexes 
> have been corrupted.
[..]

First: the system locked up also with a UP kernel, so maybe this
is not entirely related to SMP locking? Or maybe locks are now
used in a non-smp environment as well. I guess so since the advent
of KSE? Anyway...

I have applied Bruce Evans' sio.c patch and now I did get a 'ddb'
prompt after the machine paniced. I could call doadump()
and this time there is enough space in /var.
The following applies again to the SMP kernel but no
PAE stuff.

Here is the panic message and ddb trace:
[..]
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x34
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0539096
stack pointer           = 0x10:0xe5502ab0
frame pointer           = 0x10:0xe5502ad4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 9176 (cvs)
kernel: type 12 trap, code=0
Stopped at      0xc0539096 = witness_checkorder+0x176:  movl    0x34(%eax),%edx

db> trace
witness_checkorder(c07100a4,9,c069f530,19b,500) at 0xc0539096 = witness_checkor6
_mtx_lock_spin_flags(c07100a4,0,c069f530,19b,c070a0e0) at 0xc0516e9e = _mtx_loce
turnstile_lookup(c070a0e0,c070a0e0,3bc,c06a390d,e5502b54) at 0xc05382ae = turnse
_mtx_lock_sleep(c070a0e0,0,c06a390d,3bc) at 0xc051701e = _mtx_lock_sleep+0x66
_mtx_lock_flags(c070a0e0,0,c06a390d,3bc,c0711c30) at 0xc0516e0f = _mtx_lock_fla7
kern_open(c403bd20,81ef440,0,1,1b6) at 0xc056873f = kern_open+0xc7
open(c403bd20,e5502d14,3,ab,296) at 0xc0568674 = open+0x18
syscall(2f,2835002f,bfbf002f,4,283502e0) at 0xc0664703 = syscall+0x217
Xint0x80_syscall() at 0xc06539ef = Xint0x80_syscall+0x1f
--- syscall (5, FreeBSD ELF32, open), eip = 0x282c9e1b, esp = 0xbfbfe0fc, ebp =-
[..]

Please note, this time, there was no LOR happening!

Here is a log of my sort of fruitless gdb session.
Since there was no LOR, I am not sure if the sleepq_chain is
related to the panic? 

However, the panic is obviously triggered inside the witness
code, because *lock_list = 0x0 in line 749. Although a few lines
above, the list is checked for beeing empty (line 707), just
Colin has already pointed out from the first trace I could
get. But between line 707 and 749 there is no obvious modification
to this list. I am not sure what 'find_instance()' does?
So maybe another thread on another CPU has modified the locklist
meanwhile? Is this possible?

Anyway, here is my gdb session. I poked around in the mutexes
in the 'turnstile_chain' without finding anything obviously
wrong.

[..]
-rw-r--r--   1 root  wheel           5 Feb 23 20:42 minfree
-rw-------   1 root  wheel  2147352576 Jul  1 16:58 vmcore.1
atleo6:/var/crash#gdb6 -k kernel.1 vmcore.1 
GNU gdb 20040615 [GDB v6.x for FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-portbld-freebsd5.2"...
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x34
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc0539096
stack pointer	        = 0x10:0xe5502ab0
frame pointer	        = 0x10:0xe5502ad4
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 9176 (cvs)
kernel: type 12 trap, code=0
Dumping 2047 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032
---
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:236
236		dumping++;
doadump () at /usr/src/sys/kern/kern_shutdown.c:236
236		dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:236
#1  0xc0451f2e in db_fncall (dummy1=0, dummy2=0, dummy3=-1066115808, 
    dummy4=0xe55028ec "\b)På\032oQÀ ]tÀF") at /usr/src/sys/ddb/db_command.c:551
#2  0xc0451d3c in db_command (last_cmdp=0xc0702150, cmd_table=0x0, aux_cmd_tablep=0xc06ba454, 
    aux_cmd_tablep_end=0xc06ba46c) at /usr/src/sys/ddb/db_command.c:348
#3  0xc0451e14 in db_command_loop () at /usr/src/sys/ddb/db_command.c:475
#4  0xc0454551 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_trap.c:73
#5  0xc06523fd in kdb_trap (type=12, code=0, regs=0xe5502a70)
    at /usr/src/sys/i386/i386/db_interface.c:159
#6  0xc0664433 in trap_fatal (frame=0xe5502a70, eva=52) at /usr/src/sys/i386/i386/trap.c:810
#7  0xc0664177 in trap_pfault (frame=0xe5502a70, usermode=0, eva=52) at /usr/src/sys/i386/i386/trap.c:733
#8  0xc0663e19 in trap (frame=
      {tf_fs = -1066336232, tf_es = -931856368, tf_ds = 16, tf_edi = 9, tf_esi = -1066336092, tf_ebp = -447730988, tf_isp = -447731044, tf_ebx = -4194252, tf_edx = -1, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1068265322, tf_cs = 8, tf_eflags = 66118, tf_esp = -447731012, tf_ss = -1068404931})
    at /usr/src/sys/i386/i386/trap.c:420
#9  0xc065399a in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#10 0xc0710018 in turnstile_chains ()
#11 0xc8750010 in ?? ()
#12 0x00000010 in ?? ()
#13 0x00000009 in ?? ()
#14 0xc07100a4 in turnstile_chains ()
#15 0xe5502ad4 in ?? ()
#16 0xe5502a9c in ?? ()
#17 0xffc00034 in ?? ()
#18 0xffffffff in ?? ()
#19 0x00000000 in ?? ()
#20 0x00000000 in ?? ()
#21 0x0000000c in ?? ()
#22 0x00000000 in ?? ()
#23 0xc0539096 in witness_checkorder (lock=0xc07100a4, flags=9, 
    file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411)
    at /usr/src/sys/kern/subr_witness.c:749
#24 0xc0516e9e in _mtx_lock_spin_flags (m=0xc07100a4, opts=0, 
    file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411) at /usr/src/sys/kern/kern_mutex.c:354
#25 0xc05382ae in turnstile_lookup (lock=0xc070a0e0) at /usr/src/sys/kern/subr_turnstile.c:411
#26 0xc051701e in _mtx_lock_sleep (m=0xc070a0e0, opts=0, 
    file=0xc06a390d "/usr/src/sys/kern/vfs_syscalls.c", line=956) at /usr/src/sys/kern/kern_mutex.c:458
#27 0xc0516e0f in _mtx_lock_flags (m=0xc070a0e0, opts=0, 
    file=0xc06a390d "/usr/src/sys/kern/vfs_syscalls.c", line=956) at /usr/src/sys/kern/kern_mutex.c:252
#28 0xc056873f in kern_open (td=0xc403bd20, path=0x0, pathseg=UIO_USERSPACE, flags=1, mode=438)
    at /usr/src/sys/kern/vfs_syscalls.c:956
#29 0xc0568674 in open (td=0xc403bd20, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:926
#30 0xc0664703 in syscall (frame=
      {tf_fs = 47, tf_es = 674562095, tf_ds = -1078001617, tf_edi = 4, tf_esi = 674562784, tf_ebp = -1077944024, tf_isp = -447730316, tf_ebx = 674484940, tf_edx = 0, tf_ecx = 0, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 674012699, tf_cs = 31, tf_eflags = 662, tf_esp = -1077944068, tf_ss = 47})
    at /usr/src/sys/i386/i386/trap.c:1004
#31 0xc06539ef in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:201
#32 0x0000002f in ?? ()
#33 0x2835002f in ?? ()
#34 0xbfbf002f in ?? ()
#35 0x00000004 in ?? ()
#36 0x283502e0 in ?? ()
#37 0xbfbfe128 in ?? ()
#38 0xe5502d74 in ?? ()
#39 0x2833d2cc in ?? ()
#40 0x00000000 in ?? ()
#41 0x00000000 in ?? ()
#42 0x00000005 in ?? ()
#43 0x0000000c in ?? ()
#44 0x00000002 in ?? ()
---Type <return> to continue, or q <return> to quit---up 23
#45 0x282c9e1b in ?? ()
#46 0x0000001f in ?? ()
#47 0x00000296 in ?? ()
#48 0xbfbfe0fc in ?? ()
#49 0x0000002f in ?? ()
#50 0x00000000 in ?? ()
#51 0x00000000 in ?? ()
#52 0x0804ad90 in ?? ()
#53 0x00000000 in ?? ()
#54 0x67ff8000 in ?? ()
#55 0xc8755dc0 in ?? ()
#56 0xc8755e6c in ?? ()
#57 0xe55029c4 in ?? ()
#58 0xe55029b4 in ?? ()
#59 0xc403bd20 in ?? ()
#60 0xc052bd30 in sched_switch (td=0x2833d2cc) at /usr/src/sys/kern/sched_4bsd.c:676
Previous frame inner to this frame (corrupt stack?)
(kgdb) up 23
#23 0xc0539096 in witness_checkorder (lock=0xc07100a4, flags=9, 
    file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411)
    at /usr/src/sys/kern/subr_witness.c:749
749		lock1 = &(*lock_list)->ll_children[(*lock_list)->ll_count - 1];
(kgdb) l
744		/*
745		 * Check for duplicate locks of the same type.  Note that we only
746		 * have to check for this on the last lock we just acquired.  Any
747		 * other cases will be caught as lock order violations.
748		 */
749		lock1 = &(*lock_list)->ll_children[(*lock_list)->ll_count - 1];
750		w1 = lock1->li_lock->lo_witness;
751		if (w1 == w) {
752			if (w->w_same_squawked || (lock->lo_flags & LO_DUPOK))
753				return;
(kgdb) p lock_list
$2 = (struct lock_list_entry *) 0x0
(kgdb) up
#24 0xc0516e9e in _mtx_lock_spin_flags (m=0xc07100a4, opts=0, 
    file=0xc069f530 "/usr/src/sys/kern/subr_turnstile.c", line=411) at /usr/src/sys/kern/kern_mutex.c:354
354		WITNESS_CHECKORDER(&m->mtx_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE,
(kgdb) l
349	
350		MPASS(curthread != NULL);
351		KASSERT(m->mtx_object.lo_class == &lock_class_mtx_spin,
352		    ("mtx_lock_spin() of sleep mutex %s _at_ %s:%d",
353		    m->mtx_object.lo_name, file, line));
354		WITNESS_CHECKORDER(&m->mtx_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE,
355		    file, line);
356	#if defined(SMP) || LOCK_DEBUG > 0 || 1
357		_get_spin_lock(m, curthread, opts, file, line);
358	#else
(kgdb) p m
$3 = (struct mtx *) 0xc07100a4
(kgdb) p *m
$4 = {mtx_object = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
    lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc07100cc, 
      tqe_prev = 0xc071008c}, lo_witness = 0xc0712900}, mtx_lock = 4, mtx_recurse = 0}
(kgdb) p m->mtx_object
$5 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
  lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc07100cc, 
    tqe_prev = 0xc071008c}, lo_witness = 0xc0712900}
(kgdb) p m->mtx_object->lo_class
$6 = (struct lock_class *) 0xc06e25a4
(kgdb) p m->mtx_object->lo_class
$7 = {lc_name = 0xc069cce2 "spin mutex", lc_flags = 10}
(kgdb) p m->mtx_object->lo_list
$12 = {tqe_next = 0xc07100cc, tqe_prev = 0xc071008c}
(kgdb) p m->mtx_object->lo_list->tqe_next
$13 = (struct lock_object *) 0xc07100cc
(kgdb) p m->mtx_object->lo_list->tqe_next
$14 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
  lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc07100f4, 
    tqe_prev = 0xc07100b4}, lo_witness = 0xc0712900}
(kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next
$15 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
  lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc071011c, 
    tqe_prev = 0xc07100dc}, lo_witness = 0xc0712900}
(kgdb) p opts
$16 = 0
(kgdb) p file
$17 = 0xc069f530 "/usr/src/sys/kern/subr_turnstile.c"
(kgdb) p line
$18 = 411
(kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next
$22 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
  lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc0710144, 
    tqe_prev = 0xc0710104}, lo_witness = 0xc0712900}
(kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next
$23 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
  lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc071011c, 
    tqe_prev = 0xc07100dc}, lo_witness = 0xc0712900}
(kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next
$24 = {lo_class = 0xc06e25a4, lo_name = 0xc069f553 "turnstile chain", 
  lo_type = 0xc069f553 "turnstile chain", lo_flags = 196608, lo_list = {tqe_next = 0xc071016c, 
    tqe_prev = 0xc071012c}, lo_witness = 0xc0712900}
(kgdb) p *m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_cl ass
$25 = {lc_name = 0xc069cce2 "spin mutex", lc_flags = 10}
(kgdb) p m->mtx_object->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_list->tqe_next->lo_listt->tqe_next->lo_list
$27 = {tqe_next = 0xc0710194, tqe_prev = 0xc0710154}
(kgdb) quit
Script done on Thu Jul  1 17:19:36 2004
[..]

Cheers,
 Daniel
-- 
IRCnet: Mr-Spock         - ceterum censeo Microsoftinem esse delendam -  
 Daniel Lang * dl_at_leo.org * +49 89 289 18532 * http://www.leo.org/~dl/
Received on Thu Jul 01 2004 - 13:33:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:59 UTC