On Sun, Jan 4, 2009 at 5:08 PM, Volker <volker_at_vwsoft.com> wrote: > Gentlemen, > > I've had the pleasure to inspect miwi's new tinderbox machine in a panic > situation. > > The debugging info we're able to get out of the box is the following: > > panic: bundirty: buffer 0xffffffff9a475438 still on queue 1 > > #9 0xffffffff8049f196 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:556 > #10 0xffffffff8050b540 in bundirty (bp=Variable "bp" is not available. > ) at /usr/src/sys/kern/vfs_bio.c:1068 > #11 0xffffffff8050d684 in brelse (bp=0xffffffff9a475438) > at /usr/src/sys/kern/vfs_bio.c:1388 > #12 0xffffffff8050f07c in bufdone (bp=0xffffffff9a475438) > at /usr/src/sys/kern/vfs_bio.c:3157 > #13 0xffffffff8069cb6c in ffs_backgroundwritedone (bp=0xffffffff9a475438) > ---Type <return> to continue, or q <return> to quit--- > at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1679 > #14 0xffffffff8050f051 in bufdone (bp=0xffffffff9a475438) > at /usr/src/sys/kern/vfs_bio.c:3151 > #15 0xffffffff80452b51 in g_io_schedule_up (tp=Variable "tp" is not > available. > ) > at /usr/src/sys/geom/geom_io.c:587 > #16 0xffffffff8045323f in g_up_procbody () at > /usr/src/sys/geom/geom_kern.c:95 > #17 0xffffffff8048037a in fork_exit ( > callout=0xffffffff804531d0 <g_up_procbody>, arg=0x0, > frame=0xffffffff80e63c80) at /usr/src/sys/kern/kern_fork.c:789 > > in frame 11, 'p *bp' gives: > $1 = {b_bufobj = 0xffffff00034fe958, b_bcount = 16384, b_caller1 = 0x0, > b_data = 0xffffffff9f4b7000 "", b_error = 5, b_iocmd = 2 '\002', > b_ioflags = 2 '\002', b_iooffset = 7127891968, b_resid = 16384, > b_iodone = 0, b_blkno = 13921664, b_offset = 7127891968, b_bobufs = { > tqe_next = 0xffffffff9a505a38, tqe_prev = 0xffffff00034fe9a8}, > b_left = 0x0, b_right = 0xffffffff9a505a38, b_vflags = 0, b_freelist = { > tqe_next = 0xffffffff9a472db8, tqe_prev = 0xffffffff80b56210}, > b_qindex = 1, b_flags = 41092, b_xflags = 33 '!', b_lock = > {lock_object = { > lo_name = 0xffffffff8083ce18 "bufwait", > lo_type = 0xffffffff8083ce18 "bufwait", lo_flags = 91947008, > lo_witness_data = {lod_list = {stqe_next = 0xffffffff80ae6f80}, > lod_witness = 0xffffffff80ae6f80}}, lk_lock = 18446744073709551608, > lk_recurse = 0, lk_timo = 0, lk_pri = 80}, b_bufsize = 16384, > b_runningbufspace = 0, b_kvabase = 0xffffffff9f4b7000 "", b_kvasize = > 16384, > b_lblkno = 13921664, b_vp = 0xffffff00034fe820, b_dirtyoff = 0, > b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0, > b_saveaddr = 0xffffffff9f4b7000, b_pager = {pg_reqpage = 0}, b_cluster = { > cluster_head = {tqh_first = 0xffffffff9a479c68, > tqh_last = 0xffffffff9a4901b0}, cluster_entry = { > tqe_next = 0xffffffff9a479c68, tqe_prev = 0xffffffff9a4901b0}}, > b_pages = {0xffffff00de0eafa0, 0xffffff00de0eb010, 0xffffff00de0eb080, > 0xffffff00de0eb0f0, 0x0 <repeats 28 times>}, b_npages = 4, b_dep = { > lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, > b_fsprivate3 = 0x0, b_pin_count = 0} > > This panic does not occur before commit 176708. > > The panic is in sys/kern/vfs_bio.c bundirty 1055: > KASSERT( bp->b_flags & B_REMFREE || bp->b_qindex == QUEUE_NONE ... > > bp->b_qindex = 0x01 (QUEUE_FREE) > bp->b_flags = 0xa084 (B_ASYNC | B_DELWRI | B_NOCACHE | B_INVAL) > > My assumption is: > > brele knows about the QUEUE_FREE but bundirty only wants to operate on > QUEUE_NONE. Either this is a race condition, brele should not call > bundirty or bundirty should operate not just on QUEUE_NONE buffers. > > As there have been some locking related commits in the changes in > question, this might also be caused by an unlocked operation. > > right before the panic, we're seeing DMA errors: > > ad6: setting up DMA failed > _vfs_done():ad6[WRITE(offset=5412487168, length=131072)]error = 5 > ad6: FAILURE - load data > ad6: setting up DMA failed > g_vfs_done():ad6[WRITE(offset=5044109312, length=131072)]error = 5 > g_vfs_done():ad6[WRITE(offset=5230985216, length=131072)]error = 5 > g_vfs_done():ad6[WRITE(offset=5231116288, length=131072)]error = 5 > g_vfs_done():ad6[WRITE(offset=5044240384, length=131072)]error = 5 > g_vfs_done():ad6[WRITE(offset=5412618240, length=131072)]error = 5 > g_vfs_done():ad4s1d[WRITE(offset=7127891968, length=16384)]error = 5 > I am seeing this sort of output on a machine here as well, though I have not been able to get a crash dump (it complains about not being able to set up DMA ...). I saw a bundirty panic today, but I've also seen panic: initiate_write_inodeblock_ufs2: already started a number of times. I think what I'm seeing happening is that something (probably the actual bug) is causing the bio to get into a weird state (I'm not entirely sure about this), which then cascades to GEOM_MIRROR dropping the disk from the array. I think the initiate_write_inodeblock_ufs2 panics only happen when I have lost both disks from the array in such a fashion and/or am trying to rebuild one of them. In any case, I have been seeing this as a regression from 7.1-prerelease to current of a few days ago, and am trying to back out possibly-relevant changes to locate the problem. (I had originally suspected GEOM, but will switch my focus to vfs_bio having seen this.) Such feeble notes as I have been able to make are at http://stuff.mit.edu/afs/sipb.mit.edu/user/kaduk/freebsd/periphrasis/20090107/panic.txt and I have also placed `ident /boot/kernel.old/kernel` and friends in that directory. (kernel.old is 7.1-prerelease, and kernel and kernel.updated are two current snapshots) Thanks, Ben Kaduk > kgdb output can be found at: > http://www.bsdmeat.net/~lando/miwi/kgdb.txt > > Suggestions what the correct fix to this issue is? Change bundirty or > brele? Clean up locking? > > CC'ing those who committed to vfs_bio.c in the relevant time frame. > > Thanks > > Volker > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" >Received on Fri Jan 16 2009 - 19:42:06 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:40 UTC