Re: FreeBSD8-RELENG (sources from this morning) deadlock

From: Ted Faber <faber_at_isi.edu>
Date: Fri, 2 Oct 2009 14:31:16 -0700
On Mon, Sep 28, 2009 at 09:14:07PM -0700, Ted Faber wrote:
> I've been havinhard time getting this motherboard to work under FreeBSD
> 8.0.  It comes up fairly fine (though I didn't get the PATA disk
> identified  under 7-STABLE so it may also be shaky under 8), but after
> between 3 and 10 minutes of running multi user it locks up hard, and
> wont respond to input at all.  No keyboard response, couldn't get into a
> kernel debugger, nothing.  I couldn't get a kernel dump for that reason.
> 
> However, when I bring the node up single user I'm able to do s full fsck
> of the file systems - which is a 10 minute experience - without
> deadlock.  It doesn't ever seem to lock up single-user.
> 
> I booted a WITNESS/INVARIENTS/DIAGNOSTIC kernel on it today but didn't
> get a whole lot of diagnostic info.  I see one LOR as the system comes
> up, a conflict between vfs_bio.c line 2559 and ufs_dirhash.c line 285,
> which I've seen in other traces.
> 
> After a clean boot and as I'm shutting down, I see 2 LORs when unmounting file
> systems.  The first is (ufs) vfs_mount.c 1200 and (devfs) ffs_vfsops.c
> 1194 and the second is (ufs) vfs_mount.c 1200 and (syncer) vfs_subr.c
> 2188.  As I say it's difficult to capture the screen at this point, but
> I'm happy to take pictures if the detailed traces would be a help.
> 
> That second reversal with the syncer holding one of the locks seems
> suspicious to me.  If the syncer is the source of the deadlock it would
> explain the nondeterministic lock up times, and the lack of lockup
> single user.
> 
> I've attached the dmesg from a verbose boot with the debug kernel.  Any
> ideas about how to get more information or what the problems might be
> would be greatly appreciated.

I'm still deadlocked, but trying to get more info.  I tried a firewire
debugging session using dcons, but that failed for reasons unclear.
dconschat seems to do the right thing:

---
musicbox:~$ sudo dconschat -v ‐br ‐G 12345 ‐t 00-00-4c-01-00-00-3b-c7
dconschat: kvm_nlist: No such file or directory
dcons_buf: 0xc0d74740
port 0   size offset   gen   pos
output:  9184     56     1  1550
input :  3062   9240     0   106
port 1   size offset   gen   pos
output:  3061  12302     0     0
input :  1021  15363     0     0
[dcons connected]
(1)(1)(1)(1)(1)
---

But the gdb fails to connect:

----
musicbox:~$ kgdb kernel
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols found)...
(kgdb) target remote :12345
:12345: Connection refused.
----

Any help with this or the original query (dmesg attached) would be very
appreciated.

-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG

Received on Fri Oct 02 2009 - 19:31:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:56 UTC