Re: nullfs broken on powerpc

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Wed, 25 Jan 2012 22:08:17 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC

On Wed, Jan 25, 2012 at 08:50:41PM +0100, Milan Obuch wrote:
> On Wed, 25 Jan 2012 14:21:23 +0200
> Kostik Belousov <kostikbel_at_gmail.com> wrote:
> 
> > On Tue, Jan 24, 2012 at 06:31:52PM +0100, Milan Obuch wrote:
> > > Hi,
> > > 
> 
> [ snip ]
> 
> > > This does not work with powerpc for me. With sources csup'ped this
> > > morning, full system rebuild with GENERIC kernel, it is enough for
> > > me to issue
> > > 
> > > mount_nullfs /data/src10 /usr/src
> > > csup /usr/share/examples/cvsup/standard-supfile
> > > 
> > > and system panic occurs, with following on system console:
> > > 
> > > panic: mtx_lock() of spin mutex (null)
> > > _at_ /usr/src/sys/kern/vfs_subr.c:2670 cpuid = 0
> > > KDB: enter: panic
> > > [ thread pid 1442 tid 100095 ]
> > > Stopped at 0x40f734: addi r0, r0, 0x0
> > > db>
> > > 
> > > At this point, I am able to interact with system, the question for
> > > me is what I want to get from it :) I tried bt with following
> > > result:
> > > 
> > > Tracing pid 1442 tid 100095 td 0x2d6b000
> > > 0xe22c26d0: at panic+0x274
> > > 0xe22c2730: at _mtx_lock_flags+0xc4
> > > 0xe22c2760: at vgonel+0x330
> > > 0xe22c27b0: at vrecycle+0x54
> > > 0xe22c27d0: at null_inactive+0x30
> > > 0xe22c27f0: at VOP_INACTIVE_APV+0xdc
> > > 0xe22c2810: at vinactive+0x98
> > > 0xe22c2850: at vputx+0x344
> > > 0xe22c28a0: at vput+0x18
> > > 0xe22c28c0: at kern_statat_vnhook+0x108
> > > 0xe22c29d0: at kern_statat+0x18
> > > 0xe22c29f0: at kern_lstat+0x2c
> > > 0xe22c2a10: at sys_lstat+0x30
> > > 0xe22c2a90: at trap+0x388
> > > 0xe22c2b60: at powerpc_interrupt+0x108
> > > 0xe22c2b90: user SC trap by _end+0x40d88c70: srr1=0xd032
> > >             r1=0xffaf9a70 cr=0x28004044 xer=0x20000000
> > > ctr=0x41a0ac40
> > > db>
> > > 
> > > Does this shed any light for someone with more knowledge here? My
> > > gut feeling is there is some endianness issue at play, the same
> > > nullfs usage works for me flawlessly on both i386 and amd64
> > > systems, so it could not be 32 vs 64 bit issue at least.
> > > 
> > > At line 2670 of /usr/src/sys/kern/vfs_subr.c I see end of function
> > > void vgonel(struct vnode *vp)
> > > 
> > >         VI_LOCK(vp);
> > >         vp->v_vnlock = &vp->v_lock;
> > >         vp->v_op = &dead_vnodeops;
> > >         vp->v_tag = "none";
> > >         vp->v_type = VBAD;
> > > }
> > > 
> > > so the question seems to be reduced to 'why is vp null?' or is my
> > > small attempt on analyse flawed...
> 
> > I do not think that the vp is null. It more look like the *vp memory
> > was zeroed. This has very low chances of being related to endianess,
> > and more like a kernel memory corruption.
> > 
> > Take a dump and print the content of *vp.
> 
> How could I look into memory? I found page
> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html
> and I can see registers (show reg), use x with absolute addresses, but
> something like 'x vp' tells just 'Symbol not known' - should I somehow
> load symbol table into memory? But backtrace shows function names... or
> should I somehow modify GENERIC kernel to include more debugging info?
> Kernel debugging is a bit new for me, even if I can write simple
> modification into kernel, but only in some special (and narrow) area of
> code...

You shall/could take a dump and then use kgdb to look at *vp.
If doing from ddb, you need to look at the disassembly of the
function to undestand where to find the vp (probably in some register).