On Sun, 4 Jan 2004, Stefan Ehmann wrote: > For the first time I got a backtrace that ended in the soundcard module > - So maybe this is the right direction (on the other hand this might be > some newly introduced error) > > panic: bad bufsize > #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240 > #1 0xc04e5198 in boot (howto=256) at > /usr/src/sys/kern/kern_shutdown.c:372 > #2 0xc04e5527 in panic () at /usr/src/sys/kern/kern_shutdown.c:550 > #3 0xc07ec648 in feed_vchan_s16 () from /boot/kernel/snd_pcm.ko > #4 0xc07e2c6d in sndbuf_feed () from /boot/kernel/snd_pcm.ko > #5 0xc07e3225 in chn_wrfeed () from /boot/kernel/snd_pcm.ko > #6 0xc07e327c in chn_wrintr () from /boot/kernel/snd_pcm.ko > #7 0xc07e3990 in chn_intr () from /boot/kernel/snd_pcm.ko > #8 0xc07fca2f in csa_intr () from /boot/kernel/snd_csa.ko > #9 0xc07fb724 in csa_intr () from /boot/kernel/snd_csa.ko > #10 0xc04d1692 in ithread_loop (arg=0xc1737b00) > at /usr/src/sys/kern/kern_intr.c:544 > #11 0xc04d0684 in fork_exit (callout=0xc04d1500 <ithread_loop>, arg=0x0, > frame=0x0) at /usr/src/sys/kern/kern_fork.c:796 The sound driver is in modules. Have you ruled out having an inconsistent module. > Nearly all other panics (I put a collection of ~20 backtraces on > http://stud4.tuwien.ac.at/~e0125637/fbsd/ if it's of any interest) ended > up in file system code. The args in many of these don't make sense. E.g., in bt.13: % ... % #15 0xc06743f8 in calltrap () at {standard input}:94 % #16 0xc0502a34 in lockmgr (lkp=0xc3a91028, flags=0, interlkp=0xe4, % td=0xc06bfc68) at /usr/src/sys/kern/kern_lock.c:228 % #17 0xc0566c67 in vfs_busy (mp=0x0, flags=0, interlkp=0x0, td=0x0) % at /usr/src/sys/kern/vfs_subr.c:527 % #18 0xc056362c in lookup (ndp=0xd8cd5bdc) at /usr/src/sys/kern/vfs_lookup.c:559 % ... mp=0x0 in #17 can't happen, and is inconsistent with lkp!=0 in #16 (since lkp = &mp->mnt_lock). The panic in line 228 in #16 is from lkp being garbage, so apparently it is not &mp->mnt_lock for a correct mp either. flags=0 and interlkp=0x0 in #17 are correct. td=0x0 is inconsistent with td!=0 in #16 (I think the latter is correct). flags=0 in #16 is correct. interlkp=0xe4 in #16 is inconsistent with interlkp=0 in #17. Compiling without optimization might simplify debugging, but I thought that gcc doesn't clobber stack args even with optimization. bt.34 has an example of a garbage arg the is unlikely to be from mis-debugging it since it is in 2 stack frames: % ... % #3 0xc0560029 in getblk (vp=0xc42ec6d8, blkno=16, size=7537385, slpflag=0, ^^^^^^^^^^^^ % slptimeo=0, flags=0) at /usr/src/sys/kern/vfs_bio.c:2415 % #4 0xc05650bd in cluster_read (vp=0xc42ec6d8, filesize=3675636, lblkno=16, % size=7537385, cred=0x0, totread=32768, seqcount=0, bpp=0x0) ^^^^^^^^^^^^ % at /usr/src/sys/kern/vfs_cluster.c:127 % #5 0xc419efa7 in ext2_read (ap=0xd8d17bcc) % at _at_/gnu/ext2fs/ext2_readwrite.c:115 % ... `size' is the ext2fs block size. It should be either 1K or 4K for ext2fs. So the size, or more likely the pointer to it, is corrupt in ext2_read() here: % size = BLKSIZE(fs, ip, lbn); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ % blkoffset = blkoff(fs, uio->uio_offset); % % xfersize = fs->s_frag_size - blkoffset; % if (uio->uio_resid < xfersize) % xfersize = uio->uio_resid; % if (bytesinfile < xfersize) % xfersize = bytesinfile; % % if (lblktosize(fs, nextlbn) >= ip->i_size) % error = bread(vp, lbn, size, NOCRED, &bp); % else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) % error = cluster_read(vp, % ip->i_size, lbn, size, NOCRED, ^^^^ % uio->uio_resid, (ap->a_ioflag >> 16), &bp); BLKSIZE(fs, ip, lbn) depends mainly on ip (fs is ip->FS), so it seems that we have a corrupt ip here. This is consistent with some of the panics being nearer to inode allocation. *_vget() has delicate locking which has caused problems before. I just remembered than ext2_vget() and other *_vget() (nfs...) are missing the last round of locking changes in ffs_vget() (ffs_vfsops.c 1.181). But these changes are only claimed to be simplifications and optimizationns. I can't see any bugs in the locking in ext2_vget() except that stuff like the following makes too many assumptions: % /* % * Lock out the creation of new entries in the FFS hash table in % * case getnewvnode() or MALLOC() blocks, otherwise a duplicate % * may occur! % */ % if (ext2fs_inode_hash_lock) { % while (ext2fs_inode_hash_lock) { % ext2fs_inode_hash_lock = -1; % tsleep(&ext2fs_inode_hash_lock, PVM, "e2vget", 0); % } % goto restart; % } % ext2fs_inode_hash_lock = 1; This assumes that the kernel is non-preemptible or that something else prevents concurrent non-atomic test-and-set of ex2fs_inode_hash_lock. I think Giant locking now prevents concurrent access, and anyway, concurrent access is too unlikely to give so many panics if the bug is here. BruceReceived on Mon Jan 05 2004 - 01:37:08 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:36 UTC