Re: When will ZFS become stable?

From: Henri Hennebert <hlh_at_restart.be>
Date: Sun, 06 Jan 2008 17:47:29 +0100
Kris Kennaway wrote:
> Henri Hennebert wrote:
>> Kris Kennaway wrote:
>>> Ivan Voras wrote:
>>>> On 06/01/2008, Peter Schuller <peter.schuller_at_infidyne.com> wrote:
>>>>>> This number is not so large. It seems to be easily crashed by rsync,
>>>>>> for example (speaking from my own experience, and also some of my
>>>>>> colleagues).
>>>>> I can definitely say this is not *generally* true, as I do a lot of
>>>>> rsyncing/rdiff-backup:ing and similar stuff (with many files / 
>>>>> large files)
>>>>> on ZFS without any stability issues. Problems for me have been 
>>>>> limited to
>>>>> 32bit and the memory exhaustion issue rather than "hard" issues.
>>>>
>>>> It's not generally true since kmem problems with rsync are often hard
>>>> to repeat - I have them on one machine, but not on another, similar
>>>> machine. This nonrepeatability is also a part of the problem.
>>>>
>>>>> But perhaps that's all you are referring to.
>>>>
>>>> Mostly. I did have a ZFS crash with rsync that wasn't kmem related,
>>>> but only once.
>>>
>>> kmem problems are just tuning.  They are not indicative of stability 
>>> problems in ZFS.  Please report any further non-kmem panics you 
>>> experience.
>>
>> I encounter 2 times a deadlock during high I/O activity (the last one 
>> during rsync + rm -r on a 5GB hierarchy (openoffice-2/work).
>>
>> I was running with this patch:
>> http://people.freebsd.org/~pjd/patches/zgd_done.patch
>> db> show allpcpu
>> Current CPU: 1
>>
>> cpuid        = 0
>> curthread    = 0xa5ebe440: pid 3422 "txg_thread_enter"
>> curpcb       = 0xeb175d90
>> fpcurthread  = none
>> idlethread   = 0xa5529aa0: pid 12 "idle: cpu0"
>> APIC ID      = 0
>> currentldt   = 0x50
>>
>> cpuid        = 1
>> curthread    = 0xa56ab220: pid 47 "arc_reclaim_thread"
>> curpcb       = 0xe6837d90
>> fpcurthread  = none
>> idlethread   = 0xa5529880: pid 11 "idle: cpu1"
>> APIC ID      = 1
>> currentldt   = 0x50
>>
>> With the 2 times arc_reclaim_thread `running`
> 
> Backtraces of the affected processes (or just alltrace) are usually 

noted for next time

> required to proceed with debugging, and lock status is also often vital 
> (show alllocks, requires witness).

I add it to my kernel config

   Also, in the case when threads are
> actually running (not deadlocked), then it is often useful to repeatedly 
> break/continue and sample many backtraces to try and determine where the 
> threads are looping.

I do this after the second deadlock and arc_reclaim_thread was always 
there and second cpu was idle.

Henri
> 
> Kris
Received on Sun Jan 06 2008 - 15:47:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:25 UTC