Re: sysinstall spec_getpages panic (with VM overtones)

From: Robert Watson <rwatson_at_freebsd.org> Date: Wed, 20 Aug 2003 17:31:39 -0400 (EDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:19 UTC

On Wed, 20 Aug 2003, Gavin Atkinson wrote:

> On the 8th August kuriyama_at_imgsrc.co.jp mentioned he was getting a panic
> with FreeBSD inside VMware where _mtx_lock is being called with a NULL
> mutex from spec_getpages. I'm also seeing this, 100% reproducible, on
> real hardware. (see message ID XFMail.20030808154731.jhb_at_FreeBSD.org for
> the original posters email and jhb's reply) For me, Sysinstall panics
> during the extraction of the base package: 
> 
> (note that I do not get to see a register dump)  kernel: type 12 trap,
> code=0
> 
> _mtx_lock_flags(0,0,c0529513,300,ffffffff) at _mtx_lock_flags+0x43
> spec_getpages(cce33b3c,54,0,cce33b2c,0) at spec_getpages+0x26c
> ffs_getpages(cce33b80,0,c05459de,274,c05c63e0) at ffs_getpages+0x5f6
> vnode_pager_getpages(c0bebafc,cce33c70,1,0,cce33c20) at
> vnode_pager_getpages+0x73 vm_fault(c1259900,819b000,1,0,c12534c0) at
> vm_fault+0x8e2 trap_pfault(cce33d48,1,819b004,200,819b004) at
> trap_pfault+0x109 trap(2f,2f,2f,82e533c,0) at trap+0x1fc calltrap() at
> calltrap+0x5

I've been getting similar reports locally from our trustedbsd_sebsd
branch.  We thought originally it was a local merge problem we introduced
due to some inconsistent merging of specfs changes, but I think we have
now have eliminated that.  I suppose I'm relieved... (?)

> I first noticed this with the 20030811 JPSNAP, but have tried with the
> 9th July 2003 JPSNAP, and yesterdays snapshot, and see the same result
> on both. I see the same panic whether installing over the net or from
> CD.  With 64 meg of ram, it panics half way through the read the chunks
> that make up the base package, upping the ram to 256 allows it to read
> all of the chunks before panicing. 

Sounds identical.

> *c0529513 = "/usr/src/sys/fs/specfs/spec_vnops.c", line 0x300 is line 768:
> 
> 766     gotreqpage = 0;
> 767     VM_OBJECT_LOCK(vp->v_object);
> 768     vm_page_lock_queues();
> 769     for (i = 0, toff = 0; i < pcount; i++, toff = nextoff) {
> 
> so ap->a_vp is null. I'#m afraid that's the limit of my ddb ability. 
> 
> Any suggestions as to where I should go from here? I don't really have
> the facility at the moment to make release to test patches but will try
> to if necessary. 

Is it ap->a_vp that's NULL, or vp->v_object that's NULL?  vp is
dereferenced several times before that in the code, so if vp is really
NULL at line 767, we're probably talking about memory corruption.  But if
vp->v_object is NULL, then it could be we're not creating a VM object
along some code path.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Network Associates Laboratories