On Fri, 2005-May-27 08:27:52 -0700, Ted Faber wrote: >work something out, bit I do have a laptop running in the same >environment (and with a kernel from the same source) that does not >exhibit this problem. That's a useful snippet. I missed the bit about same source before. What are the differences between the systems (including kernel compilation options)? That might provide a clue as to the underlying problem. Have you tried running the same sort of workload on your laptop? Is is feasible to run one of the kernels on both systems? >> It might be useful to know some more details about that NFS mount >> (fsid 0x0600ff07). Can you tell us the mount parameters and what the >> server is (OS type). > >Most o fthe nfs filesystems are automounted. I'm on the machine now, so >I can't look at debugger output, but I can tell you that most of the NFS >mounts that I can imagine either psi or bash looking at are automounted. >The mount parameters are: timeo=8,retrans=9,intr I didn't notice amd before. If you can't avoid NFS, any chance of (at least temporarily) hard-mounting all the relevant filesystems and disabling amd? amd acts as an NFS server to detect activity on the automount filesystems. Both the backtraces you posted show that one process is blocked on an NFS request and amd is blocked on ufs. The locks on the second backtrace show that the bash waiting on an NFS request is a root of the deadlock tree. If that NFS request is supposed to be handled by amd, you close the deadlock cycle. Also, if your mounts are interruptable, that nfsreq sleep is interruptable - you could try dropping into DDB, finding the process sleeping on nfsreq and killing it ("kill signal_number pid" in ddb, no '-' on the signal number), then using "cont" to recover. That might break the deadlock. >For completeness, the server is a Solaris box. Don't laugh: >boreas:~$ uname -a >SunOS boreas.isi.edu 5.9 Generic_117171-12 sun4u sparc Sun's NFS implementations should be trustable :-). > If moving the config does not solve it, is there some output from >teh debugger I should get about the file system? I can't see any DDB command to dump the mount table and doing it manually would be painful. Have you managed to get a crash dump? (If not, what does "call doadump" do?) Alternatively, have you ever tried running remote GDB? > It really helps to talk these things >out with someone knowledgable. Unfortunately, no-one knowledgable has showed up :-). -- Peter JeremyReceived on Fri May 27 2005 - 18:43:06 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:35 UTC