On Fri, May 27, 2005 at 06:37:34PM +1000, Peter Jeremy wrote: > On Thu, 2005-May-26 13:32:43 -0700, Ted Faber wrote: > >On Thu, May 26, 2005 at 09:08:46AM -0700, Ted Faber wrote: > >Next lock up is now. Same kernel, pics are at > > > >http://www.isi.edu/~faber/tmp/deadlock/DSCN048{83,84,85,86,87,88,89,90,91}.JPG > > After comparing it with the last URL, I worked out it was actually > http://www.isi.edu/~faber/tmp/deadlock/DSCN04{83,84,85,86,87,88,89,90,91}.JPG Sorry. Typo. > > >My inexpert reading is that one of the threads of the psi jabber client > >is locked on something. "Something" why I need help. :-) > > There are two filesystem locks: > - The psi process (pid 6936) is holding a lock on ad0s1a (probably /) > The thread in question is waiting on a nfs lock. > - A bash process (pid 6598) is holding an NFS lock and waiting on nfsreq > > According to the vnode locks, there's one process waiting on the NFS > lock held by bash and 7 processes waiting on the ufs lock held by psi. > Without access to the actual process and lock structures, I can't be > certain but it looks very much like psi is waiting on the NFS lock held > by bash (there are no other processes waiting on nfs). > > It's looking more like an NFS problem. I'm not sure where to go next > but I'd more strongly suggest that you try to get the system running > without NFS. For debugging or for my own sanity? :-) It's going to be fairly problematic to move from NFS and keep things going reasonably here. If it's a step we need to take to debug I can work something out, bit I do have a laptop running in the same environment (and with a kernel from the same source) that does not exhibit this problem. > > It might be useful to know some more details about that NFS mount > (fsid 0x0600ff07). Can you tell us the mount parameters and what the > server is (OS type). Most o fthe nfs filesystems are automounted. I'm on the machine now, so I can't look at debugger output, but I can tell you that most of the NFS mounts that I can imagine either psi or bash looking at are automounted. The mount parameters are: timeo=8,retrans=9,intr Ummmmm. As I look this up, I realize that the amd config file in which this stuff resides is itself on an NFS file system. Not an automounted one, but an NFS filesystem nonetheless. I've got a very bad feeling about that all of a sudden. Visions of the automounter being asked to mount a filesystem that it has to look up in this config file that is temporarily unavailable due to network glitch (or some NFS race, or someone locking the file to edit it) seem bad. I'm going to move that configuration file. How does that possibility sound to you? For completeness, the server is a Solaris box. Don't laugh: boreas:~$ uname -a SunOS boreas.isi.edu 5.9 Generic_117171-12 sun4u sparc I'll move that configuration. With any luck this will solve my problem, though if you see somthing else more promising, don't hesitate to speak up. If moving the config does not solve it, is there some output from teh debugger I should get about the file system? Thanks again for all your help. It really helps to talk these things out with someone knowledgable. I hope it does turn out to be this NFS double jeopardy/pilot error, but if not I'll speak up again. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:35 UTC