Re: kern/93942: panic: ufs_dirbad: bad dir

From: Matthew Dillon <dillon_at_apollo.backplane.com> Date: Thu, 4 May 2006 12:33:28 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:55 UTC

    I've found three additional issues which might be related to ufs_dirbad
    panics.  Again, unfortunately, no smoking gun.

    First, if B_NOCACHE gets set on a B_DIRTY buffer, the buffer can be
    lost without the data being written under certain conditions due
    to brelse() mechanics.  B_NOCACHE is typically set by softupdates 
    related code but can be set by other things as well (in particular,
    if a buffer is resized, and certain write/read combinations).  One
    might think that calling bwrite() after setting B_NOCACHE would be
    safe, but that is not necessarily true.  If a buffer is redirtied
    (B_DIRTY set) during the write, something which softupdates does all
    the time, B_NOCACHE almost certainly has to be cleared.  Of the three
    issues I found, this is the most likely cause.

    Second, vnode_pager_setsize() is being called too late in 
    ufs/ufs/ufs_lookup.c (line 733 in FreeBSD-current).  It is
    being called after the buffer has been instantiated.  This could
    create problems with the VMIO backing store for the buffer created
    by the UFS_BALLOC call.

    Third, vnode_pager_setsize() is being called too late in
    ufs/ufs/ufs_vnops.c (line 1557 in FreeBSD-current).  It is 
    being called after the buffer has been instantiated by UFS_BALLOC()
    in ufs_mkdir(), which could create problems with the buffer's VMIO
    backing store.

    --

    The M.O. of this corruption, after examining over a dozen kernel cores,
    makes me now believe that the corruption is occuring when the kernel
    attempts to append a full block to a directory.  The bitmaps are all
    good... it is if as though the directory block never got written and
    the data we are seeing is data that existed in tha block before the
    directory allocated it.  But, likewise, the issue has occured with
    different disk drivers so I think we can rule out a disk driver failure.
    The issue also seems to occur most often with large, 'busy' buffers
    (lots of directory operations going on).  Since no similar corruption
    has ever been reported for heavily used files, this supports the idea
    that it is *not* the disk driver.

    I believe that the data is getting written to the filesystem buffer
    representing the new block, but the buffer or its backing store
    is somehow getting thrown away without being written, or getting thrown
    away and then reinstantiated without being read.   The areas I 
    indicate in the above list are areas where data can potentially get
    thrown away or lost prior to a write.

					-Matt
					Matthew Dillon 
					<dillon_at_backplane.com>


(Patch against DragonFly, will not apply to FreeBSD directly, included for
reference only):

Index: kern/vfs_bio.c
===================================================================
RCS file: /cvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.53.2.1
diff -u -r1.53.2.1 vfs_bio.c
--- kern/vfs_bio.c	18 Apr 2006 17:12:25 -0000	1.53.2.1
+++ kern/vfs_bio.c	24 Apr 2006 19:22:04 -0000
_at__at_ -972,6 +972,13 _at__at_
 bdirty(struct buf *bp)
 {
 	KASSERT(bp->b_qindex == BQUEUE_NONE, ("bdirty: buffer %p still on queue %d", bp, bp->b_qindex));
+	if (bp->b_flags & B_NOCACHE) {
+		printf("bdirty: clearing B_NOCACHE on buf %p\n", bp);
+		bp->b_flags &= ~B_NOCACHE;
+	}
+	if (bp->b_flags & B_INVAL) {
+		printf("bdirty: warning, dirtying invalid buffer %p\n", bp);
+	}
 	bp->b_flags &= ~(B_READ|B_RELBUF);
 
 	if ((bp->b_flags & B_DELWRI) == 0) {
_at__at_ -1096,6 +1103,11 _at__at_
 
 	crit_enter();
 
+	if ((bp->b_flags & (B_NOCACHE|B_DIRTY)) == (B_NOCACHE|B_DIRTY)) {
+		printf("warning: buf %p marked dirty & B_NOCACHE, clearing B_NOCACHE\n", bp);
+		bp->b_flags &= ~B_NOCACHE;
+	}
+
 	if (bp->b_flags & B_LOCKED)
 		bp->b_flags &= ~B_ERROR;
 
Index: vfs/ufs/ufs_lookup.c
===================================================================
RCS file: /cvs/src/sys/vfs/ufs/ufs_lookup.c,v
retrieving revision 1.18
diff -u -r1.18 ufs_lookup.c
--- vfs/ufs/ufs_lookup.c	14 Sep 2005 01:13:48 -0000	1.18
+++ vfs/ufs/ufs_lookup.c	24 Apr 2006 19:22:23 -0000
_at__at_ -716,6 +716,7 _at__at_
 		 */
 		if (dp->i_offset & (DIRBLKSIZ - 1))
 			panic("ufs_direnter: newblk");
+		vnode_pager_setsize(dvp, dp->i_offset + DIRBLKSIZ);
 		flags = B_CLRBUF;
 		if (!DOINGSOFTDEP(dvp) && !DOINGASYNC(dvp))
 			flags |= B_SYNC;
_at__at_ -727,7 +728,6 _at__at_
 		}
 		dp->i_size = dp->i_offset + DIRBLKSIZ;
 		dp->i_flag |= IN_CHANGE | IN_UPDATE;
-		vnode_pager_setsize(dvp, (u_long)dp->i_size);
 		dirp->d_reclen = DIRBLKSIZ;
 		blkoff = dp->i_offset &
 		    (VFSTOUFS(dvp->v_mount)->um_mountp->mnt_stat.f_iosize - 1);
Index: vfs/ufs/ufs_vnops.c
===================================================================
RCS file: /cvs/src/sys/vfs/ufs/ufs_vnops.c,v
retrieving revision 1.32
diff -u -r1.32 ufs_vnops.c
--- vfs/ufs/ufs_vnops.c	17 Sep 2005 07:43:12 -0000	1.32
+++ vfs/ufs/ufs_vnops.c	24 Apr 2006 19:22:42 -0000
_at__at_ -1420,12 +1420,12 _at__at_
 	dirtemplate = *dtp;
 	dirtemplate.dot_ino = ip->i_number;
 	dirtemplate.dotdot_ino = dp->i_number;
+	vnode_pager_setsize(tvp, DIRBLKSIZ);
 	if ((error = VOP_BALLOC(tvp, (off_t)0, DIRBLKSIZ, cnp->cn_cred,
 	    B_CLRBUF, &bp)) != 0)
 		goto bad;
 	ip->i_size = DIRBLKSIZ;
 	ip->i_flag |= IN_CHANGE | IN_UPDATE;
-	vnode_pager_setsize(tvp, (u_long)ip->i_size);
 	bcopy((caddr_t)&dirtemplate, (caddr_t)bp->b_data, sizeof dirtemplate);
 	if (DOINGSOFTDEP(tvp)) {
 		/*