Re: sysinstall spec_getpages panic (with VM overtones)

From: Gavin Atkinson <gavin_at_ury.york.ac.uk>
Date: Mon, 25 Aug 2003 01:34:28 +0100 (BST)
On Wed, 20 Aug 2003, Robert Watson wrote:
> On Wed, 20 Aug 2003, Gavin Atkinson wrote:
> > _mtx_lock_flags(0,0,c0529513,300,ffffffff) at _mtx_lock_flags+0x43
> > spec_getpages(cce33b3c,54,0,cce33b2c,0) at spec_getpages+0x26c
> > ffs_getpages(cce33b80,0,c05459de,274,c05c63e0) at ffs_getpages+0x5f6
> > vnode_pager_getpages(c0bebafc,cce33c70,1,0,cce33c20) at
> > vnode_pager_getpages+0x73 vm_fault(c1259900,819b000,1,0,c12534c0) at
> > vm_fault+0x8e2 trap_pfault(cce33d48,1,819b004,200,819b004) at
> > trap_pfault+0x109 trap(2f,2f,2f,82e533c,0) at trap+0x1fc calltrap() at
> > calltrap+0x5
> >
> > *c0529513 = "/usr/src/sys/fs/specfs/spec_vnops.c", line 0x300 is line 768:
> >
> > 766     gotreqpage = 0;
> > 767     VM_OBJECT_LOCK(vp->v_object);
> > 768     vm_page_lock_queues();
> > 769     for (i = 0, toff = 0; i < pcount; i++, toff = nextoff) {
>
> Is it ap->a_vp that's NULL, or vp->v_object that's NULL?  vp is
> dereferenced several times before that in the code, so if vp is really
> NULL at line 767, we're probably talking about memory corruption.  But if
> vp->v_object is NULL, then it could be we're not creating a VM object
> along some code path.

Although this panic is 100% reproducible during the initial install
through sysinstall, I have tried hard but can not reproduce this once the
system is installed and running multiuser, even by performing the same
actions within sysinstall. I have I have also tried without success to get
a crash dump of the panic, however after a fair bit of head scratching it
looks from a grep of the source code like the "dumpdev" loader variable
documented in loader(8) is not yet implemented... and as far as I can tell
there is no other way I can get the installer off CD to generate a dump.

I'm trying to make a release with extra debugging info, but won't be able
to test this until at least Wednesday or Thursday. What extra debugging
info would be useful? Who would be the best person to discuss this with?
>From what kuriyama said, it appears that it is indeed vp->v_object that is
null, so I have added the following to specfs_vnops.c just before the lock
that fails:

  if (vp->v_object == NULL)
    panic("vp->v_object is null in %s, rdev=%s", __func__, devtoname(vp->v_rdev));

Hopefully that will help diagnose the cause a little further, but I'm
really working blind here - this is not an area of the kernel I understand
at all. If there is any other debugging info I can provide that may be
useful, I'm happy to have a go. Kuriyama, if you have any spare time
before I am able to do it, maybe you could add the above code and find out
what message it panics with?

Gavin
Received on Sun Aug 24 2003 - 15:34:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:20 UTC