Re: bwrite() wdrain hang in -current

From: Don Lewis <truckman_at_FreeBSD.org> Date: Tue, 6 May 2003 22:27:40 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:06 UTC

On  6 May, I wrote:

> The vnode in question this time around is 0xc9ba27fc, which corresponds
> to /tmp/obj/usr/src/gnu/usr.bin/cc/cc1/cc1.  The stack for the bufdaemon
> thread holding the lock is:
> 
> Proc 0xc6153000 [SLPQ  wdrain c05ce978][SLP] bufdaemon
> mi_switch(c60d8260,44,c050c1cf,ca,1) at mi_switch+0x210
> msleep(c05ce978,c05ce980,44,c0511486,0) at msleep+0x432
> bwrite(d28ce2e0,0,c0511368,697,137e400) at bwrite+0x442
> vfs_bio_awrite(d28ce2e0,0,c0511368,87b,0) at vfs_bio_awrite+0x221
> flushbufqueues(0,0,c0511368,10e,64) at flushbufqueues+0x17d
> buf_daemon(0,e0a92d48,c0509769,310,0) at buf_daemon+0xdc
> fork_exit(c0363240,0,e0a92d48) at fork_exit+0xc0
> fork_trampoline() at fork_trampoline+0x1a
> 
> What is puzzling is why this process is sleeping here.  It appears that
> maybe a wakeup didn't happen.  This machine has 1 GB of RAM, so I don't
> think memory pressure should be a cause.  Here's the source at bwrite+0x442

Sigh ... it looks like the problem is that enough work gets queued up on
the NFS client side that it is preventing the server side from draining
the total amount to below lorunningspace.  This deadlocks the NFS server
side, which prevents the NFS client side from draining.

static __inline void
runningbufwakeup(struct buf *bp)
{
        if (bp->b_runningbufspace) {
                atomic_subtract_int(&runningbufspace, bp->b_runningbufspace);
                bp->b_runningbufspace = 0;
                mtx_lock(&rbreqlock);
                if (runningbufreq && runningbufspace <= lorunningspace) {
                        runningbufreq = 0;
                        wakeup(&runningbufreq);
                }
                mtx_unlock(&rbreqlock);
        }
}

Probably the best cure would be to always allow at least some minimum
amount per device or mount point.