Re: ZFS and deadlock with {nullfs,NFS}

From: Kris Kennaway <kris_at_obsecurity.org>
Date: Wed, 20 Jun 2007 12:28:54 -0400
On Wed, Jun 20, 2007 at 12:17:54PM -0400, Joe Marcus Clarke wrote:
> On Wed, 2007-06-20 at 12:03 -0400, Kris Kennaway wrote:
> > On Wed, Jun 20, 2007 at 11:53:43AM -0400, Joe Marcus Clarke wrote:
> > > I've resurrected by amd64 Tinderbox with a ZFS base, and I've been
> > > seeing a 100% reproducible deadlock when I use it with either localhost
> > > NFS or nullfs.  When this occurs, the CPU is 100% idle, but I can no
> > > longer connect via SSH, and the box will only reboot from the debugger.
> > > I know there are some tuning bits I can tweak, but all I've run across
> > > is for memory consumption.  Any pointers would be helpful.  I'm also at
> > > the debugger, so if there is anything I can do to help troubleshoot why
> > > this is happening, please let me know.  
> > > 
> > > This box is -CURRENT as of June 19, 2007.  It has a GENERIC kernel minus
> > > devices I do not have (i.e. SMP kernel).  I am currently using nullfs
> > > for the Tinderbox.  The process that most regularly locks up is mtree.
> > > Here is the trace:
> > 
> > > A full process list from the debugger can be found at
> > > http://www.marcuscom.com/downloads/cobbler_proc.txt .
> > 
> > 404 at the moment, but look for processes involving zil* in the
> > backtrace.  I had to disable zil (vfs.zfs.zil_disable=1 tunable) to
> > prevent low-memory deadlocks on my machines.  Since then it's been
> > fine.
> 
> Fixed, sorry.
> 
> > 
> > You may also wish to use my patches (see the archives) to improve
> > performance and low-memory behaviour.
> 
> Thanks for the advice.  I'll check.  I didn't think low memory since it
> didn't look like I was using much.  Even now with the box locked, I have
> 1035 MB free with no swap in use (this box has 2 GB total).

By default there is only a 320 MB kmem_map into which all of zfs
(including its buffer cache and I/O buffers) has to cram itself, so
that is where the low memory condition may be happening.  This is one
of the things that should be tuned to give non-terrible performance by
actually allowing some caching to occur.

Kris
Received on Wed Jun 20 2007 - 14:28:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:12 UTC