Re: panic in bundirty

From: Martin Wilke <miwi_at_FreeBSD.org>
Date: Fri, 9 Jan 2009 22:49:34 +0100
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Any news here?

On Sun, Jan 04, 2009 at 11:08:57PM +0100, Volker wrote:
> Gentlemen,
> 
> I've had the pleasure to inspect miwi's new tinderbox machine in a panic
> situation.
> 
> The debugging info we're able to get out of the box is the following:
> 
> panic: bundirty: buffer 0xffffffff9a475438 still on queue 1
> 
> #9  0xffffffff8049f196 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:556
> #10 0xffffffff8050b540 in bundirty (bp=Variable "bp" is not available.
> ) at /usr/src/sys/kern/vfs_bio.c:1068
> #11 0xffffffff8050d684 in brelse (bp=0xffffffff9a475438)
>     at /usr/src/sys/kern/vfs_bio.c:1388
> #12 0xffffffff8050f07c in bufdone (bp=0xffffffff9a475438)
>     at /usr/src/sys/kern/vfs_bio.c:3157
> #13 0xffffffff8069cb6c in ffs_backgroundwritedone (bp=0xffffffff9a475438)
> ---Type <return> to continue, or q <return> to quit---
>     at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1679
> #14 0xffffffff8050f051 in bufdone (bp=0xffffffff9a475438)
>     at /usr/src/sys/kern/vfs_bio.c:3151
> #15 0xffffffff80452b51 in g_io_schedule_up (tp=Variable "tp" is not
> available.
> )
>     at /usr/src/sys/geom/geom_io.c:587
> #16 0xffffffff8045323f in g_up_procbody () at
> /usr/src/sys/geom/geom_kern.c:95
> #17 0xffffffff8048037a in fork_exit (
>     callout=0xffffffff804531d0 <g_up_procbody>, arg=0x0,
>     frame=0xffffffff80e63c80) at /usr/src/sys/kern/kern_fork.c:789
> 
> in frame 11, 'p *bp' gives:
> $1 = {b_bufobj = 0xffffff00034fe958, b_bcount = 16384, b_caller1 = 0x0,
>   b_data = 0xffffffff9f4b7000 "", b_error = 5, b_iocmd = 2 '\002',
>   b_ioflags = 2 '\002', b_iooffset = 7127891968, b_resid = 16384,
>   b_iodone = 0, b_blkno = 13921664, b_offset = 7127891968, b_bobufs = {
>     tqe_next = 0xffffffff9a505a38, tqe_prev = 0xffffff00034fe9a8},
>   b_left = 0x0, b_right = 0xffffffff9a505a38, b_vflags = 0, b_freelist = {
>     tqe_next = 0xffffffff9a472db8, tqe_prev = 0xffffffff80b56210},
>   b_qindex = 1, b_flags = 41092, b_xflags = 33 '!', b_lock =
> {lock_object = {
>       lo_name = 0xffffffff8083ce18 "bufwait",
>       lo_type = 0xffffffff8083ce18 "bufwait", lo_flags = 91947008,
>       lo_witness_data = {lod_list = {stqe_next = 0xffffffff80ae6f80},
>         lod_witness = 0xffffffff80ae6f80}}, lk_lock = 18446744073709551608,
>     lk_recurse = 0, lk_timo = 0, lk_pri = 80}, b_bufsize = 16384,
>   b_runningbufspace = 0, b_kvabase = 0xffffffff9f4b7000 "", b_kvasize =
> 16384,
>   b_lblkno = 13921664, b_vp = 0xffffff00034fe820, b_dirtyoff = 0,
>   b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0,
>   b_saveaddr = 0xffffffff9f4b7000, b_pager = {pg_reqpage = 0}, b_cluster = {
>     cluster_head = {tqh_first = 0xffffffff9a479c68,
>       tqh_last = 0xffffffff9a4901b0}, cluster_entry = {
>       tqe_next = 0xffffffff9a479c68, tqe_prev = 0xffffffff9a4901b0}},
>   b_pages = {0xffffff00de0eafa0, 0xffffff00de0eb010, 0xffffff00de0eb080,
>     0xffffff00de0eb0f0, 0x0 <repeats 28 times>}, b_npages = 4, b_dep = {
>     lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
>   b_fsprivate3 = 0x0, b_pin_count = 0}
> 
> This panic does not occur before commit 176708.
> 
> The panic is in sys/kern/vfs_bio.c bundirty 1055:
> KASSERT( bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE ...
> 
> bp->b_qindex = 0x01 (QUEUE_FREE)
> bp->b_flags = 0xa084 (B_ASYNC | B_DELWRI | B_NOCACHE | B_INVAL)
> 
> My assumption is:
> 
> brele knows about the QUEUE_FREE but bundirty only wants to operate on
> QUEUE_NONE. Either this is a race condition, brele should not call
> bundirty or bundirty should operate not just on QUEUE_NONE buffers.
> 
> As there have been some locking related commits in the changes in
> question, this might also be caused by an unlocked operation.
> 
> right before the panic, we're seeing DMA errors:
> 
> ad6: setting up DMA failed
> _vfs_done():ad6[WRITE(offset=5412487168, length=131072)]error = 5
> ad6: FAILURE - load data
> ad6: setting up DMA failed
> g_vfs_done():ad6[WRITE(offset=5044109312, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5230985216, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5231116288, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5044240384, length=131072)]error = 5
> g_vfs_done():ad6[WRITE(offset=5412618240, length=131072)]error = 5
> g_vfs_done():ad4s1d[WRITE(offset=7127891968, length=16384)]error = 5
> 
> kgdb output can be found at:
> http://www.bsdmeat.net/~lando/miwi/kgdb.txt
> 
> Suggestions what the correct fix to this issue is? Change bundirty or
> brele? Clean up locking?
> 
> CC'ing those who committed to vfs_bio.c in the relevant time frame.
> 
> Thanks
> 
> Volker
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 

- -- 

+-----------------------+-------------------------------+
|  PGP    : 0x05682353  |  Jabber : miwi(at)BSDCrew.de  |
|  ICQ    : 169139903   |  Mail   : miwi(at)FreeBSD.org |
+-----------------------+-------------------------------+
|	Mess with the Best, Die like the Rest!		|
+-----------------------+-------------------------------+
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAklnxm0ACgkQFwpycAVoI1OZ7wCdFQ5FxqVkFcROpBTe2ws79ARv
378An3ez8Zb8Z2vo7nq2WBwRaOe93zdz
=hKtf
-----END PGP SIGNATURE-----
Received on Fri Jan 09 2009 - 21:05:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:40 UTC