Re: sysinstall spec_getpages panic (with VM overtones)

From: Robert Watson <rwatson_at_freebsd.org>
Date: Sun, 24 Aug 2003 21:04:08 -0400 (EDT)
On Mon, 25 Aug 2003, Gavin Atkinson wrote:

> On Wed, 20 Aug 2003, Robert Watson wrote:
> > On Wed, 20 Aug 2003, Gavin Atkinson wrote:
> > > _mtx_lock_flags(0,0,c0529513,300,ffffffff) at _mtx_lock_flags+0x43
> > > spec_getpages(cce33b3c,54,0,cce33b2c,0) at spec_getpages+0x26c
> > > ffs_getpages(cce33b80,0,c05459de,274,c05c63e0) at ffs_getpages+0x5f6
> > > vnode_pager_getpages(c0bebafc,cce33c70,1,0,cce33c20) at
> > > vnode_pager_getpages+0x73 vm_fault(c1259900,819b000,1,0,c12534c0) at
> > > vm_fault+0x8e2 trap_pfault(cce33d48,1,819b004,200,819b004) at
> > > trap_pfault+0x109 trap(2f,2f,2f,82e533c,0) at trap+0x1fc calltrap() at
> > > calltrap+0x5
> > >
> > > *c0529513 = "/usr/src/sys/fs/specfs/spec_vnops.c", line 0x300 is line 768:
> > >
> > > 766     gotreqpage = 0;
> > > 767     VM_OBJECT_LOCK(vp->v_object);
> > > 768     vm_page_lock_queues();
> > > 769     for (i = 0, toff = 0; i < pcount; i++, toff = nextoff) {
> >
> > Is it ap->a_vp that's NULL, or vp->v_object that's NULL?  vp is
> > dereferenced several times before that in the code, so if vp is really
> > NULL at line 767, we're probably talking about memory corruption.  But if
> > vp->v_object is NULL, then it could be we're not creating a VM object
> > along some code path.
> 
> Although this panic is 100% reproducible during the initial install
> through sysinstall, I have tried hard but can not reproduce this once
> the system is installed and running multiuser, even by performing the
> same actions within sysinstall. I have I have also tried without success
> to get a crash dump of the panic, however after a fair bit of head
> scratching it looks from a grep of the source code like the "dumpdev"
> loader variable documented in loader(8) is not yet implemented... and as
> far as I can tell there is no other way I can get the installer off CD
> to generate a dump. 
> 
> I'm trying to make a release with extra debugging info, but won't be
> able to test this until at least Wednesday or Thursday. What extra
> debugging info would be useful? Who would be the best person to discuss
> this with?  From what kuriyama said, it appears that it is indeed
> vp->v_object that is null, so I have added the following to
> specfs_vnops.c just before the lock that fails: 
> 
>   if (vp->v_object == NULL) 
>     panic("vp->v_object is null in %s, rdev=%s", __func__,
> devtoname(vp->v_rdev)); 
> 
> Hopefully that will help diagnose the cause a little further, but I'm
> really working blind here - this is not an area of the kernel I
> understand at all. If there is any other debugging info I can provide
> that may be useful, I'm happy to have a go. Kuriyama, if you have any
> spare time before I am able to do it, maybe you could add the above code
> and find out what message it panics with? 

Alan Cox just made a commit a couple of days ago that seems to resolve the
problem for us.  Here's the commit message so you can give it a try. 

alc         2003/08/22 10:50:32 PDT

  FreeBSD src repository

  Modified files:
    sys/fs/specfs        spec_vnops.c 
  Log:
  Use the requested page's object field instead of the vnode's.  In some
  cases, the vnode's object field is not initialized leading to a NULL
  pointer dereference when the object is locked.
  
  Tested by:      rwatson
  
  Revision  Changes    Path
  1.208     +5 -2      src/sys/fs/specfs/spec_vnops.c
Received on Sun Aug 24 2003 - 16:04:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:20 UTC