Re: When will ZFS become stable?

From: Kris Kennaway <kris_at_FreeBSD.org>
Date: Sun, 06 Jan 2008 18:13:29 +0100
Henri Hennebert wrote:
> Kris Kennaway wrote:
>> Henri Hennebert wrote:
>>> Kris Kennaway wrote:
>>>> Ivan Voras wrote:
>>>>> On 06/01/2008, Peter Schuller <peter.schuller_at_infidyne.com> wrote:
>>>>>>> This number is not so large. It seems to be easily crashed by rsync,
>>>>>>> for example (speaking from my own experience, and also some of my
>>>>>>> colleagues).
>>>>>> I can definitely say this is not *generally* true, as I do a lot of
>>>>>> rsyncing/rdiff-backup:ing and similar stuff (with many files / 
>>>>>> large files)
>>>>>> on ZFS without any stability issues. Problems for me have been 
>>>>>> limited to
>>>>>> 32bit and the memory exhaustion issue rather than "hard" issues.
>>>>>
>>>>> It's not generally true since kmem problems with rsync are often hard
>>>>> to repeat - I have them on one machine, but not on another, similar
>>>>> machine. This nonrepeatability is also a part of the problem.
>>>>>
>>>>>> But perhaps that's all you are referring to.
>>>>>
>>>>> Mostly. I did have a ZFS crash with rsync that wasn't kmem related,
>>>>> but only once.
>>>>
>>>> kmem problems are just tuning.  They are not indicative of stability 
>>>> problems in ZFS.  Please report any further non-kmem panics you 
>>>> experience.
>>>
>>> I encounter 2 times a deadlock during high I/O activity (the last one 
>>> during rsync + rm -r on a 5GB hierarchy (openoffice-2/work).
>>>
>>> I was running with this patch:
>>> http://people.freebsd.org/~pjd/patches/zgd_done.patch
>>> db> show allpcpu
>>> Current CPU: 1
>>>
>>> cpuid        = 0
>>> curthread    = 0xa5ebe440: pid 3422 "txg_thread_enter"
>>> curpcb       = 0xeb175d90
>>> fpcurthread  = none
>>> idlethread   = 0xa5529aa0: pid 12 "idle: cpu0"
>>> APIC ID      = 0
>>> currentldt   = 0x50
>>>
>>> cpuid        = 1
>>> curthread    = 0xa56ab220: pid 47 "arc_reclaim_thread"
>>> curpcb       = 0xe6837d90
>>> fpcurthread  = none
>>> idlethread   = 0xa5529880: pid 11 "idle: cpu1"
>>> APIC ID      = 1
>>> currentldt   = 0x50
>>>
>>> With the 2 times arc_reclaim_thread `running`
>>
>> Backtraces of the affected processes (or just alltrace) are usually 
> 
> noted for next time
> 
>> required to proceed with debugging, and lock status is also often 
>> vital (show alllocks, requires witness).
> 
> I add it to my kernel config
> 
>   Also, in the case when threads are
>> actually running (not deadlocked), then it is often useful to 
>> repeatedly break/continue and sample many backtraces to try and 
>> determine where the threads are looping.
> 
> I do this after the second deadlock and arc_reclaim_thread was always 
> there and second cpu was idle.

To repeat, it is important not just to note which thread is running, but 
*what the thread is doing*.  This means repeatedly comparing the 
backtraces, which will allow you to build up a picture of which part of 
the code it is looping in.

Kris
Received on Sun Jan 06 2008 - 16:13:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:25 UTC