Stack backtrace: how can I help?

From: Sven Willenberger <sven_at_dmv.com> Date: Wed, 30 Jun 2004 15:34:05 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:59 UTC

My abilities to dig into kernel routines, etc is very limited so I am
asking how I can help those who may be able to fix this recurring
problem.

This has been posted by myself and others with utterly no response from
anyone other than one response saying "it must be a bug".

Under heavy loads, on 5.2.1-P8 systems, I get a Stack backtrace relating
to flushing dirty buffers (ffs_fsync).

the relevant code from ffs_softdep.c ( src/sys/ufs/ffs/ffs_softdep.c,v
1.149 2003/10/23 21:14:08 jhb )

getdirtybuf(bpp, mtx, waitfor)
        struct buf **bpp;
        struct mtx *mtx;
        int waitfor;
{
        struct buf *bp;
        int error;

        /*
         * XXX This code and the code that calls it need to be reviewed
to
         * verify its use of the vnode interlock.
         */

        for (;;) {
                if ((bp = *bpp) == NULL)
                        return (0);
                if (bp->b_vp == NULL)
                        backtrace();
.....

It does seem related to the load created by perl (these machines run
spamassassin through either mimedefang or milter-spamc) and are now
running 5.8.4; the upgrade to perl made no difference ... still getting
these backtraces. Each machine handles (filters) roughly 120K email
messages per day.

a) what additional information would be of help here
b) what can I do to help troubleshoot this -- for the most part the
machines recover after the backtrace (of course they are inoperable
during the time the trace is generated creating further backlog/work for
the other machines in the cluster) although occasionally it will cause a
panic and either reboot or hang at sync.
c) is it possible to cvsup the latest ffs files and make install those
without killing the machine?