On Tue, 4 Nov 2003, Nils Andreas Hakansson wrote: > I've disabled softupdates because of > a panic("softdep_move_dependencies: need merge code"); Can't comment on this bit. Might want to send e-mail to Kirk directly. > Could someone take a look at this? > > pst: timeout mfa=0x0032d5d0 cmd=0x02 > pst: timeout mfa=0x00336390 cmd=0x02 > pst: timeout mfa=0x0034cdd0 cmd=0x02 > <cut> > pst: timeout mfa=0x003b7ab0 cmd=0x02 > pst: timeout mfa=0x00396db0 cmd=0x02 > pst: timeout mfa=0x003a3530 cmd=0x02 > pst: timeout mfa=0x00376890 cmd=0x02 This is your storage device getting unhappy, but I'm not really informed enough on pst to say how or why. I don't know if it is because the requests are bad, or because the controller/chain/device is unable to service the request. > ufs_access(): Error retrieving ACL on object (5). > <cut> > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). This is the UFS ACL code failing closed: it's unable to read the ACLs from disk due to EIO (I/O failure). This is a correct response to that scenario. > Fatal trap 12: page fault while in kernel mode > cpuid = 0; lapic.id = 00000000 > fault virtual address = 0xae18c0de > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc066a566 > stack pointer = 0x10:0xea3a78cc > frame pointer = 0x10:0xea3a7900 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 76932 (smbd) > kernel: type 12 trap, code=0 > Stopped at generic_bcopy+0x1a: repe movsl (%esi),%es:(%edi) > db> trace > generic_bcopy(cf6b0000,1a8,2,c06bd12c,0) at generic_bcopy+0x1a > ffs_getextattr(ea3a7960,ea3a795c,c05159ad,d0346200,184) at > ffs_getextattr+0xe0 This appears to be a bug in UFS2's handling of corrupted EA data on disk. We have some changes in the TrustedBSD development trees to improve resilience to on-disk corruption, but haven't merged them yet. Just to confirm, could you use "gdb -k" on a copy of your kernel with debugging symbols to see where *ffs_getextattr+0xe0 is? For me, it turns up in ffs_vnops.c:1616, which is a variable assignment. There's a bcopy not far above there, which seems the likely candidate. > vn_extattr_get(cb1a8c8c,8,2,c06bd12c,ea3a79d0) at vn_extattr_get+0xaa > ufs_getacl(ea3a7a14,ea3a7a40,c061560b,ea3a7a14,c06df280) at > ufs_getacl+0x99 > ufs_vnoperate(ea3a7a14,c06df280,2,a6,c853cd10) at ufs_vnoperate+0x18 > ufs_access(ea3a7a6c,ea3a7b28,c057dcc9,ea3a7a6c,c0716cc8) at > ufs_access+0xca > ufs_vnoperate(ea3a7a6c,c0716cc8,c0716cc8,c853cd10,cb1a8c8c) at > ufs_vnoperate+0x1 > 8 > vn_open_cred(ea3a7bdc,ea3a7cdc,1a4,d0bb7800,22) at vn_open_cred+0x359 > vn_open(ea3a7bdc,ea3a7cdc,1a4,22,c3ee0fb4) at vn_open+0x30 > kern_open(c853cd10,bfbff130,0,1,1a4) at kern_open+0x143 > open(c853cd10,ea3a7d14,c06c44d0,3ed,3) at open+0x30 > syscall(bfbf002f,82b002f,bfbf002f,bfbffd70,82b3724) at syscall+0x28f > Xint0x80_syscall() at Xint0x80_syscall+0x1d > --- syscall (5, FreeBSD ELF32, open), eip = 0x662b5233, esp = 0xbfbff07c, > ebp = > 0xbfbff098 --- > db> show locks > exclusive sleep mutex Giant r = 0 (0xc07115c0) locked _at_ > /usr/src/sys/vm/vm_fault > .c:223 Holding Giant here is good. So to summarize: This could be the result of a disk read failure. The UFS code appears to be intolerant of said failure. The ACL code failed closed properly, although perhaps not so usefully. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert_at_fledge.watson.org Network Associates LaboratoriesReceived on Tue Nov 04 2003 - 10:20:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC