Re: ZFS Hangs

From: Adam McDougall <mcdouga9_at_egr.msu.edu>
Date: Tue, 6 Nov 2007 09:34:13 -0500
On Mon, Nov 05, 2007 at 12:05:08PM -0500, Adam McDougall wrote:

  On Mon, Nov 05, 2007 at 10:24:14AM +0100, Kris Kennaway wrote:
  
    Thomas Sparrevohn wrote:
  >> On Sunday 04 November 2007 15:00:50 Kris Kennaway wrote:
  >>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
  >>> 
  >> Oh my god - Overlooked that ;-) - funny that -  Its a bit tricky as it not 
  >> possibly to dump a kernel
  >> when the swap is on ZFS - I did a test with all debugging enabled and the 
  >> problem
  >> did not show up - which makes it somewhat nasty - I check if I can 
  >> reproduce it with only DDB enabled 
    
    You can still hook up a serial console, or at the very least take 
    photographs of the screen with the relevant DDB information.  Or add 
    another disk and dump on that.
    
    Kris
    
  I have some screenshots of ps in ddb from one of several zfs hangs I've had
  on one amd64 system:
  
  http://www.egr.msu.edu/~mcdouga9/pics/zfs/
  
  I didn't post every single screenful since I don't have a microsd reader handy,
  and emailing the pictures off my phone is painful.  If I missed a screenshot of
  one or more particular processes that might have a telling state, let me know.
  
  I also have a gzipped kernel + dump from a forced panic when it was in this
  state, if a developer is interested in it please let me know so I can post it
  somewhere private since the system is in NIS and likely has tables cached
  in memory.  
  
  It is running a kernel from Oct 17.  I tried a kernel with WITNESS, INVARIANTS
  etc but it did the same hang without any panic.  I completed a zpool scrub
  this morning with no errors.  Lately zfs seems to wedge up every single night
  when rsync from remote servers run.  This is the only amd64 system I have zfs on,
  the other two are i386 and the problems on those systems have only been kmem panics
  which so far have been avoidable.  
  
  I can help by checking somewhat specific things and running prescribed tests,
  but right now I don't have time to tackle this problem on this system and learn
  how to debug it entirely on my own starting with nothing more than a DDB guide
  from the handbook.  Its not that I refuse to; I recognize its difficult to
  join remote skill with local hands for something this technical. 

Sorry if I seemed negetive or unhelpful, I will try on my own if I have time but
I'm pretty busy lately.  On a hunch from other past emails, I tried turning off ZIL 
and so far it survived the night, rsync is still running.  The only other change
I did was running the zpool scrub yesterday (no fixes were needed) and I applied
the patch to make more of the zfs process states visible in top.  I've rebooted 
several times (each time after zfs hung) so uptime isn't an issue, but for every
day rsync doesn't finish, the next day's rsync might has more updates because it
missed a day.  
  
  Friday I replaced the motherboard/cpu just as a shot in the dark (since the
  system had some strange instability in the past) but this didn't help zfs 
  (not surprised).  When zfs was hung saturday morning, I tried to reboot it
  but reboot would not even get far enough to stop new ssh connections.
  _______________________________________________
  freebsd-current_at_freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
  
Received on Tue Nov 06 2007 - 13:34:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:21 UTC