Re: nullfs broken on powerpc

From: Milan Obuch <freebsd-current_at_dino.sk> Date: Thu, 26 Jan 2012 10:12:53 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC

On Wed, 25 Jan 2012 22:00:26 +0100
Andreas Tobler <andreast_at_FreeBSD.org> wrote:

> On 25.01.12 21:29, Eitan Adler wrote:
> > On Wed, Jan 25, 2012 at 2:50 PM, Milan
> > Obuch<freebsd-current_at_dino.sk>  wrote:
> >> On Wed, 25 Jan 2012 14:21:23 +0200
> >> Kostik Belousov<kostikbel_at_gmail.com>  wrote:
> >>

[ snip ]

> >>>> Tracing pid 1442 tid 100095 td 0x2d6b000
> >>>> 0xe22c26d0: at panic+0x274
> >>>> 0xe22c2730: at _mtx_lock_flags+0xc4
> >>>> 0xe22c2760: at vgonel+0x330
> >>>> 0xe22c27b0: at vrecycle+0x54
> >>>> 0xe22c27d0: at null_inactive+0x30
> >>>> 0xe22c27f0: at VOP_INACTIVE_APV+0xdc
> >>>> 0xe22c2810: at vinactive+0x98
> >>>> 0xe22c2850: at vputx+0x344
> >>>> 0xe22c28a0: at vput+0x18
> >>>> 0xe22c28c0: at kern_statat_vnhook+0x108
> >>>> 0xe22c29d0: at kern_statat+0x18
> >>>> 0xe22c29f0: at kern_lstat+0x2c
> >>>> 0xe22c2a10: at sys_lstat+0x30
> >>>> 0xe22c2a90: at trap+0x388
> >>>> 0xe22c2b60: at powerpc_interrupt+0x108
> >>>> 0xe22c2b90: user SC trap by _end+0x40d88c70: srr1=0xd032
> >>>>              r1=0xffaf9a70 cr=0x28004044 xer=0x20000000
> >>>> ctr=0x41a0ac40
> >>>> db>
> >>>>
> >>>> Does this shed any light for someone with more knowledge here? My
> >>>> gut feeling is there is some endianness issue at play, the same
> >>>> nullfs usage works for me flawlessly on both i386 and amd64
> >>>> systems, so it could not be 32 vs 64 bit issue at least.
> >>>>
> >>>> At line 2670 of /usr/src/sys/kern/vfs_subr.c I see end of
> >>>> function void vgonel(struct vnode *vp)
> >>>>
> >>>>          VI_LOCK(vp);
> >>>>          vp->v_vnlock =&vp->v_lock;
> >>>>          vp->v_op =&dead_vnodeops;
> >>>>          vp->v_tag = "none";
> >>>>          vp->v_type = VBAD;
> >>>> }
> >>>>
> >>>> so the question seems to be reduced to 'why is vp null?' or is my
> >>>> small attempt on analyse flawed...
> >>
> >>> I do not think that the vp is null. It more look like the *vp
> >>> memory was zeroed. This has very low chances of being related to
> >>> endianess, and more like a kernel memory corruption.
> >>>
> >>> Take a dump and print the content of *vp.
> >>
> >> How could I look into memory? I found page
> >> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html
> >> and I can see registers (show reg), use x with absolute addresses,
> >> but something like 'x vp' tells just 'Symbol not known' - should I
> >> somehow load symbol table into memory? But backtrace shows
> >> function names... or should I somehow modify GENERIC kernel to
> >> include more debugging info? Kernel debugging is a bit new for me,
> >> even if I can write simple modification into kernel, but only in
> >> some special (and narrow) area of code...
> >
> >> From ddb write 'call doadump'. Provided you have a proper dump
> >> device
> > set up in rc.conf it should work. You could then use kgdb from a
> > running computer to analyze the dump in more detail.
> 
> This only works if your target is booke, AIM (Apple based machines)
> do not have the 'call doadump' implemented yet. It is somewhere on my
> long todo list.
> 

So I looked for an ideas, found
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
and tried it but no good result - as both dump and call doadump dumped
0 MB memory... only method available now seems to be live debug until
dump is implemented for AIM... if you have anything to test, just write
me.

In the meantime, I will try to get something, just no real idea yet
how. Anyway, the more I test the more it looks like some memory
corruption issue, which is a bit more to investigate - real issue could
be well in some other area as it manifests itself...

So if anybody has any advice what to look for, I can try it.

Regards,
Milan