fragmented buffer cache hang

From: John-Mark Gurney <gurney_j_at_resnet.uoregon.edu> Date: Thu, 9 Aug 2007 11:24:38 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:16 UTC

At work, we've been having a few hangs that are apparently from a
fragmented buffer cache...  We are running w/ some UFS2 file systems
with a 64KB/64KB and 64KB/32KB block/fragment sizes which I believe is
a contributing factor to the fragmentation.  Luckily, only I/O to the
large block file systems are hung, and I've been able to run kgdb on
/dev/mem which has helped tremendously.  I am still running kgdb on
the box, so I can get any additional information requested.

Disk IO on the devices that the file systems are housed are fully
functional, as I can run ffsinfo, and dd from the disks.

Most of the processes are stuck in nbufkv (from getnewbuf) w/
needsbuffer set to VFS_BIO_NEED_BUFSPACE.  This can only get set if
it needs to defrag the buffer cache because a call to
vm_map_findspace(buffer_map fails.  The bufdaemon is stuck in qsleep.
The syncer is also stuck in nbufkv.

So, BKVASIZE which is the minimum allocation size of space in the
buffer_map was increased to 16KB to be 2x the size (at the time) of
the UFS block size of standard file systems (8KB).  We have since
increased the standard block size to 16KB, but have not made a repsective
increase to the BKVASIZE.  I see that as a possible work around, but
not as one that is guaranteed to make 64KB block FS's work though.

I have walked part of the buffer_map, but have not seen any adj_free or
max_free >= 64KB, they are usually either 32KB or 48KB...

Some information about the box:
6.2-RELEASE using a SMP kernel w/ debugging enabled, nothing else special.

I have attached sysctl -a, vmstat -m, vmstat -z and dmesg.

I believe this hang is possible w/ -current also, as the buffer cache
has not changed significantly.

Comments? Help?

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."