Re: panic "ffs_checkblk: bad block" on recent -head kernels

From: Mateusz Guzik <mjguzik_at_gmail.com> Date: Fri, 4 Dec 2015 03:51:02 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:01 UTC

On Thu, Dec 03, 2015 at 03:07:48PM -0800, Kirk McKusick wrote:
> > Date: Thu, 3 Dec 2015 23:47:52 +0100
> > From: Mateusz Guzik <mjguzik_at_gmail.com>
> > To: Rick Macklem <rmacklem_at_uoguelph.ca>
> > Cc: FreeBSD Current <freebsd-current_at_freebsd.org>
> > Subject: Re: panic "ffs_checkblk: bad block" on recent -head kernels
> > 
> > On Thu, Dec 03, 2015 at 05:08:27PM -0500, Rick Macklem wrote:
> >> Hi,
> >> 
> >> I get a fairly reproducible panic when doing a full kernel build
> >> on a 256Mbyte single core i386 when running recent kernels from -head.
> >> 
> >> The panic is "ffs_checkblk: bad block ..". I don't actually have the
> >> block # (although I think it's just 0xfffffffffffffff, given the backtrace),
> >> because it runs off the screen. (I looked up the message via the debugger
> >> from the first arg. to panic.)
> >> 
> >> Here's the backtrace without all the numbers:
> >> panic(c14f4b55, ffffffff, ffffffff, 0, 64,...)
> >> ffs_checkblk(ffffffff, 8000, fffffff9c, ffffffff, c4a02454,...)
> >> ffs_reallocblks
> >> VOP_REALLOCBLKS_APV
> >> cluster_write
> >> ffs_write
> >> VOP_WRITE_APV
> >> vn_write
> >> vn_io_fault_doio
> >> vn_io_fault1
> >> vn_io_fault
> >> dofilewrite
> >> kern_writev
> >> sys_write
> >> syscall
> >> 
> >> It doesn't happen on a kernel dated Sep. 30, but does happen on a Nov. 30 one.
> >> (I was away from home, so I didn't upgrade kernels for 2 months.)
> >> 
> >> I am slowly doing a binary search for the first kernel rev. where it occurs,
> >> but since each build takes hours, it's going to take a while;-).
> >> 
> >> At this point, it doesn't appear to happen on r289278 (just before jeff_at_'s buffer
> >> cache patch).
> >> With kernels between r289279-->r290480, I get into the "R" state that
> >> was fixed by r290481 before I get a crash.
> >> I tried reverting r289405 and r290047 from a recent kernel and the crashes still
> >> occurred, so it doesn't appear to be these commits.
> >> 
> >> I am currently testing r290481 to see if the crash occurs for this rev.
> >> 
> >> If anyone has some insight into which commit might cause this,
> >> please let me know.
> > 
> > Well, did it crash with r291460 or later?
> > 
> > If so, try the kernel just before that and if that helps, try:
> > 
> > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
> > index ff37de8..0ad6ef7 100644
> > --- a/sys/kern/vfs_subr.c
> > +++ b/sys/kern/vfs_subr.c
> > _at__at_ -2783,6 +2783,7 _at__at_ _vdrop(struct vnode *vp, bool locked)
> >         vp->v_op = NULL;
> >  #endif
> >         bzero(&vp->v_un, sizeof(vp->v_un));
> > +       vp->v_lasta = vp->v_clen = vp->v_cstart = vp->v_lastw = 0;
> >         vp->v_iflag = 0;
> >         vp->v_vflag = 0;
> >         bo->bo_flag = 0;
> > 
> > -- 
> > Mateusz Guzik <mjguzik gmail.com>
> 
> I concur with trying this suggestion. starting with r291460 these
> fields were no longer zero'ed when allocating the vnode. So you may
> have some residual values in there that are causing trouble.

I reviewed the rest of the structure, looks like this is the rest of the
fallout.

-- 
Mateusz Guzik <mjguzik gmail.com>