Re: Softupdate/kernel panic ffs_fsync

From: Sven Willenberger <sven_at_dmv.com>
Date: Wed, 16 Jun 2004 11:52:30 -0400
On Tue, 2004-06-15 at 09:16 -0400, Sven Willenberger wrote:
> On Mon, 2004-06-14 at 13:29 -0400, Sven Willenberger wrote:
> > Once upon a time I wrote:
> > 
> > > I have seen a few (unresolved) questions similar to this searching
> > > (google|archives). On a 5.2.1-Release-P2 system (actually a couple with
> > > essentially identical configs) I get the following Stack Backtrace
> > > messages:
> > > 
> > > backtrace(c070cbf8,2,e5b3af60,0,22) at backtrace +0x17
> > > getdirtybuf(f7f99bbc,0,1,e5b3a,f60,1) at getdirtybuf +0x30
> > > flush_deplist(c724e64c,1,f7f99be4,f7f99be8,0) at flush_deplist +0x43
> > > flush_inode_deps(c6c35000,5c108,f7f99c10,c0510fe3,f7f99c40) at
> > > flush_inode_deps + 0xa3
> > > softdep_sync_metadata(f7f99ca8,0,c06da90f,124,0) at
> > > softdep_sync_metadata +0x87
> > > ffs_fsync(f7f99ca8,0,c06d0c8b,beb,0) at ffs_fsync +0x3b9
> > > fsync(c7c224780,f7f99d14,c06e15c0,3ee,1) at fsync +0x151
> > > syscall(80e002f,bfbf002f,bfbf0028,0,80f57e0) at syscall +0x2a0
> > > Xint0x80_syscall() at Xint0x80_syscall() +0x1d
> > > --- syscall (95), eip=0x282a89af, esp=0xbfbfa10c, ebp=0xbfbfba68 ---
> > > 
> > > 
> > > The systems in question are mail servers that act as gateways (no local
> > > delivery) running mimedefang (2.39 - 2.42) with spamassassin. The work
> > > directory is not swap/memory mounted but rather on
> > > /var/spool/MIMEDefang. The frequency of these messages increases when
> > > bayes filtering is added (as the bayes tokens db file also resides on
> > > the same filesystem/directory).
> > > 
> > > I have read that it may be that getdirtybuf() was passed a corrupt
> > > buffer header; has anything further ever been made of this and if not,
> > > where/how do I start to help contributing to finding a solution?
> > 
> > I have yet to see a resolution to this issue. I am now running all the
> > boxen using 5.2.1-Release-P8 with perl 5.8.4 and all ports upgraded.
> > 
> > I have created 256MB Ramdisks on each machine that MIMEDefang now uses
> > for it's temp files and bayesian database but, if anything, the
> > frequency of backtraces has actually increased, rather than decreased.
> > 
> > What do I need to do to further delineate this issue? For me this is a
> > showstopper as it will occasionally cause a panic/reboot. I have these
> > machines clustered so as not to interrupt services but it is slowly
> > becoming frustrating as the machines are bailing under heavy traffic.
> > Is there any output I can provide or diagnostics I can run to help find
> > a solution?
> > 
> > Sven
> > 
> 
> Would this have anything to do with background fscking? or is the bgfsck
> only run once at bootup[+delay] if the system determines if it is
> needed? I am trying to find some commmon factor here and the only thing
> I can find is that during heavy incoming mail load (when many perl
> proceses courtesy of MIMEDefang are running) the kernel creates the
> backtrace. This is still odd because all the temp files are on a RAMdisk
> (malloc-based) - is it possible that softupdates is trying to fsync
> either swap and/or other memory devices? The following is a typical
> layout of the boxes in question:
> 
> /dev/da0s1a on / (ufs, local)
> devfs on /dev (devfs, local)
> /dev/da0s1e on /tmp (ufs, local, soft-updates)
> /dev/da0s1f on /usr (ufs, local, soft-updates)
> /dev/da0s1d on /var (ufs, local, soft-updates)
> /dev/md10 on /var/spool/MIMEDefang (ufs, local)
> 
> where the ramdisk is configured with mdconfig -a -t malloc -s 256m -u 10


Doing more research on this I see that there were in fact issues with
ffs_softdep.c which were fixed by forcing a flush rather than panic the
system if an assertion (?) or call to getdirtybuf() failed. Is it
possible that a case was missed? The error refers to:

at getdirtybuf +0x30

how do I go about determining specifically what part of the code that
refers to? I am trying to debug this problem but need some help here in
terms of exactly *how* to do this. Anyone? ... Buehler?

Again I suspect this has something to do with memory devices, .snap
directories, and/or swap-based filesystems.

Sven
Received on Wed Jun 16 2004 - 13:54:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:57 UTC