Re: panic in bundirty

From: Ben Kaduk <minimarmot_at_gmail.com> Date: Fri, 16 Jan 2009 15:42:05 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:40 UTC

On Sun, Jan 4, 2009 at 5:08 PM, Volker <volker_at_vwsoft.com> wrote:
> Gentlemen,
>
> I've had the pleasure to inspect miwi's new tinderbox machine in a panic
> situation.
>
> The debugging info we're able to get out of the box is the following:
>
> panic: bundirty: buffer 0xffffffff9a475438 still on queue 1
>
> #9  0xffffffff8049f196 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:556
> #10 0xffffffff8050b540 in bundirty (bp=Variable "bp" is not available.
> ) at /usr/src/sys/kern/vfs_bio.c:1068
> #11 0xffffffff8050d684 in brelse (bp=0xffffffff9a475438)
>    at /usr/src/sys/kern/vfs_bio.c:1388
> #12 0xffffffff8050f07c in bufdone (bp=0xffffffff9a475438)
>    at /usr/src/sys/kern/vfs_bio.c:3157
> #13 0xffffffff8069cb6c in ffs_backgroundwritedone (bp=0xffffffff9a475438)
> ---Type <return> to continue, or q <return> to quit---
>    at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1679
> #14 0xffffffff8050f051 in bufdone (bp=0xffffffff9a475438)
>    at /usr/src/sys/kern/vfs_bio.c:3151
> #15 0xffffffff80452b51 in g_io_schedule_up (tp=Variable "tp" is not
> available.
> )
>    at /usr/src/sys/geom/geom_io.c:587
> #16 0xffffffff8045323f in g_up_procbody () at
> /usr/src/sys/geom/geom_kern.c:95
> #17 0xffffffff8048037a in fork_exit (
>    callout=0xffffffff804531d0 <g_up_procbody>, arg=0x0,
>    frame=0xffffffff80e63c80) at /usr/src/sys/kern/kern_fork.c:789
>
> in frame 11, 'p *bp' gives:
> $1 = {b_bufobj = 0xffffff00034fe958, b_bcount = 16384, b_caller1 = 0x0,
>  b_data = 0xffffffff9f4b7000 "", b_error = 5, b_iocmd = 2 '\002',
>  b_ioflags = 2 '\002', b_iooffset = 7127891968, b_resid = 16384,
>  b_iodone = 0, b_blkno = 13921664, b_offset = 7127891968, b_bobufs = {
>    tqe_next = 0xffffffff9a505a38, tqe_prev = 0xffffff00034fe9a8},
>  b_left = 0x0, b_right = 0xffffffff9a505a38, b_vflags = 0, b_freelist = {
>    tqe_next = 0xffffffff9a472db8, tqe_prev = 0xffffffff80b56210},
>  b_qindex = 1, b_flags = 41092, b_xflags = 33 '!', b_lock =
> {lock_object = {
>      lo_name = 0xffffffff8083ce18 "bufwait",
>      lo_type = 0xffffffff8083ce18 "bufwait", lo_flags = 91947008,
>      lo_witness_data = {lod_list = {stqe_next = 0xffffffff80ae6f80},
>        lod_witness = 0xffffffff80ae6f80}}, lk_lock = 18446744073709551608,
>    lk_recurse = 0, lk_timo = 0, lk_pri = 80}, b_bufsize = 16384,
>  b_runningbufspace = 0, b_kvabase = 0xffffffff9f4b7000 "", b_kvasize =
> 16384,
>  b_lblkno = 13921664, b_vp = 0xffffff00034fe820, b_dirtyoff = 0,
>  b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0,
>  b_saveaddr = 0xffffffff9f4b7000, b_pager = {pg_reqpage = 0}, b_cluster = {
>    cluster_head = {tqh_first = 0xffffffff9a479c68,
>      tqh_last = 0xffffffff9a4901b0}, cluster_entry = {
>      tqe_next = 0xffffffff9a479c68, tqe_prev = 0xffffffff9a4901b0}},
>  b_pages = {0xffffff00de0eafa0, 0xffffff00de0eb010, 0xffffff00de0eb080,
>    0xffffff00de0eb0f0, 0x0 <repeats 28 times>}, b_npages = 4, b_dep = {
>    lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
>  b_fsprivate3 = 0x0, b_pin_count = 0}
>
> This panic does not occur before commit 176708.
>
> The panic is in sys/kern/vfs_bio.c bundirty 1055:
> KASSERT( bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE ...
>
> bp->b_qindex = 0x01 (QUEUE_FREE)
> bp->b_flags = 0xa084 (B_ASYNC | B_DELWRI | B_NOCACHE | B_INVAL)
>
> My assumption is:
>
> brele knows about the QUEUE_FREE but bundirty only wants to operate on
> QUEUE_NONE. Either this is a race condition, brele should not call
> bundirty or bundirty should operate not just on QUEUE_NONE buffers.
>
> As there have been some locking related commits in the changes in
> question, this might also be caused by an unlocked operation.
>
> right before the panic, we're seeing DMA errors:
>
> ad6: setting up DMA failed
> _vfs_done():ad6[WRITE(offset=5412487168, length=131072)]error = 5
> ad6: FAILURE - load data
> ad6: setting up DMA failed
> g_vfs_done():ad6[WRITE(offset=5044109312, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5230985216, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5231116288, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5044240384, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5412618240, length=131072)]error = 5
> g_vfs_done():ad4s1d[WRITE(offset=7127891968, length=16384)]error = 5
>

I am seeing this sort of output on a machine here as well, though
I have not been able to get a crash dump (it complains about not
being able to set up DMA ...).

I saw a bundirty panic today, but I've also seen
panic: initiate_write_inodeblock_ufs2: already started
a number of times.
I think what I'm seeing happening is that something (probably the
actual bug) is causing the bio to get into a weird state (I'm not
entirely sure about
this), which then cascades to GEOM_MIRROR dropping the disk from
the array.  I think the initiate_write_inodeblock_ufs2 panics only happen
when I have lost both disks from the array in such a fashion and/or
am trying to rebuild one of them.

In any case, I have been seeing this as a regression from
7.1-prerelease to current of a few days ago, and am trying to back out
possibly-relevant
changes to locate the problem.
(I had originally suspected GEOM, but will switch my focus to vfs_bio
having seen this.)

Such feeble notes as I have been able to make are at
http://stuff.mit.edu/afs/sipb.mit.edu/user/kaduk/freebsd/periphrasis/20090107/panic.txt
and I have also placed `ident /boot/kernel.old/kernel` and friends
in that directory.  (kernel.old is 7.1-prerelease, and kernel and
kernel.updated are two current snapshots)

Thanks,

Ben Kaduk

> kgdb output can be found at:
> http://www.bsdmeat.net/~lando/miwi/kgdb.txt
>
> Suggestions what the correct fix to this issue is? Change bundirty or
> brele? Clean up locking?
>
> CC'ing those who committed to vfs_bio.c in the relevant time frame.
>
> Thanks
>
> Volker
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>