Re: panic "ffs_checkblk: bad block" on recent -head kernels

From: Kirk McKusick <mckusick_at_mckusick.com> Date: Thu, 03 Dec 2015 15:07:48 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:01 UTC

> Date: Thu, 3 Dec 2015 23:47:52 +0100
> From: Mateusz Guzik <mjguzik_at_gmail.com>
> To: Rick Macklem <rmacklem_at_uoguelph.ca>
> Cc: FreeBSD Current <freebsd-current_at_freebsd.org>
> Subject: Re: panic "ffs_checkblk: bad block" on recent -head kernels
> 
> On Thu, Dec 03, 2015 at 05:08:27PM -0500, Rick Macklem wrote:
>> Hi,
>> 
>> I get a fairly reproducible panic when doing a full kernel build
>> on a 256Mbyte single core i386 when running recent kernels from -head.
>> 
>> The panic is "ffs_checkblk: bad block ..". I don't actually have the
>> block # (although I think it's just 0xfffffffffffffff, given the backtrace),
>> because it runs off the screen. (I looked up the message via the debugger
>> from the first arg. to panic.)
>> 
>> Here's the backtrace without all the numbers:
>> panic(c14f4b55, ffffffff, ffffffff, 0, 64,...)
>> ffs_checkblk(ffffffff, 8000, fffffff9c, ffffffff, c4a02454,...)
>> ffs_reallocblks
>> VOP_REALLOCBLKS_APV
>> cluster_write
>> ffs_write
>> VOP_WRITE_APV
>> vn_write
>> vn_io_fault_doio
>> vn_io_fault1
>> vn_io_fault
>> dofilewrite
>> kern_writev
>> sys_write
>> syscall
>> 
>> It doesn't happen on a kernel dated Sep. 30, but does happen on a Nov. 30 one.
>> (I was away from home, so I didn't upgrade kernels for 2 months.)
>> 
>> I am slowly doing a binary search for the first kernel rev. where it occurs,
>> but since each build takes hours, it's going to take a while;-).
>> 
>> At this point, it doesn't appear to happen on r289278 (just before jeff_at_'s buffer
>> cache patch).
>> With kernels between r289279-->r290480, I get into the "R" state that
>> was fixed by r290481 before I get a crash.
>> I tried reverting r289405 and r290047 from a recent kernel and the crashes still
>> occurred, so it doesn't appear to be these commits.
>> 
>> I am currently testing r290481 to see if the crash occurs for this rev.
>> 
>> If anyone has some insight into which commit might cause this,
>> please let me know.
> 
> Well, did it crash with r291460 or later?
> 
> If so, try the kernel just before that and if that helps, try:
> 
> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
> index ff37de8..0ad6ef7 100644
> --- a/sys/kern/vfs_subr.c
> +++ b/sys/kern/vfs_subr.c
> _at__at_ -2783,6 +2783,7 _at__at_ _vdrop(struct vnode *vp, bool locked)
>         vp->v_op = NULL;
>  #endif
>         bzero(&vp->v_un, sizeof(vp->v_un));
> +       vp->v_lasta = vp->v_clen = vp->v_cstart = vp->v_lastw = 0;
>         vp->v_iflag = 0;
>         vp->v_vflag = 0;
>         bo->bo_flag = 0;
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>

I concur with trying this suggestion. starting with r291460 these
fields were no longer zero'ed when allocating the vnode. So you may
have some residual values in there that are causing trouble.

	Kirk McKusick