Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

From: Jakob Alvermark <jakob_at_alvermark.net>
Date: Sat, 8 Sep 2018 13:56:06 +0200
On 9/7/18 6:06 PM, Mark Johnston wrote:
> On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote:
>> On 9/6/18 2:28 AM, Mark Johnston wrote:
>>> On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:
>>>> On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude_at_freebsd.org> wrote:
>>>>> On 2018-09-05 10:04, Subbsd wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm seeing a huge loss in performance ZFS after upgrading FreeBSD 12
>>>>>> to latest revision (r338466 the moment) and related to ARC.
>>>>>>
>>>>>> I can not say which revision was before except that the newver.sh
>>>>>> pointed to ALPHA3.
>>>>>>
>>>>>> Problems are observed if you try to limit ARC. In my case:
>>>>>>
>>>>>> vfs.zfs.arc_max="128M"
>>>>>>
>>>>>> I know that this is very small. However, for two years with this there
>>>>>> were no problems.
>>>>>>
>>>>>> When i send SIGINFO to process which is currently working with ZFS, i
>>>>>> see "arc_reclaim_waiters_cv":
>>>>>>
>>>>>> e.g when i type:
>>>>>>
>>>>>> /bin/csh
>>>>>>
>>>>>> I have time (~5 seconds) to press several times 'ctrl+t' before csh is executed:
>>>>>>
>>>>>> load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u 0.00s 0% 3512k
>>>>>> load: 0.70  cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k
>>>>>> load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u 0.01s 0% 3512k
>>>>>> load: 0.73  cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u 0.01s 0% 4156k
>>>>>>
>>>>>> same story with find or any other commans:
>>>>>>
>>>>>> load: 0.34  cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k
>>>>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u 0.00s 0% 2676k
>>>>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u 0.00s 0% 2680k
>>>>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u 0.00s 0% 2684k
>>>>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u 0.00s 0% 2704k
>>>>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u 0.00s 0% 2716k
>>>>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u 0.00s 0% 2760k
>>>>>>
>>>>>> this problem goes away after increasing vfs.zfs.arc_max
>>>>>> _______________________________________________
>>>>>> freebsd-current_at_freebsd.org mailing list
>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>>>>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>>>>>>
>>>>> Previously, ZFS was not actually able to evict enough dnodes to keep
>>>>> your arc_max under 128MB, it would have been much higher based on the
>>>>> number of open files you had. A recent improvement from upstream ZFS
>>>>> (r337653 and r337660) was pulled in that fixed this, so setting an
>>>>> arc_max of 128MB is much more effective now, and that is causing the
>>>>> side effect of "actually doing what you asked it to do", in this case,
>>>>> what you are asking is a bit silly. If you have a working set that is
>>>>> greater than 128MB, and you ask ZFS to use less than that, it'll have to
>>>>> constantly try to reclaim memory to keep under that very low bar.
>>>>>
>>>> Thanks for comments. Mark was right when he pointed to r338416 (
>>>> https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416
>>>> ). Commenting aggsum_value returns normal speed regardless of the rest
>>>> of the new code from upstream.
>>>> I would like to repeat that the speed with these two lines is not just
>>>> slow, but _INCREDIBLY_ slow! Probably, this should be written in the
>>>> relevant documentation for FreeBSD 12+
>> Hi,
>>
>> I am experiencing the same slowness when there is a bit of load on the
>> system (buildworld for example) which I haven't seen before.
> Is it a regression following a recent kernel update?


Yes.


>
>> I have vfs.zfs.arc_max=2G.
>>
>> Top is reporting
>>
>> ARC: 607M Total, 140M MFU, 245M MRU, 1060K Anon, 4592K Header, 217M Other
>>        105M Compressed, 281M Uncompressed, 2.67:1 Ratio
>>
>> Should I test the patch?
> I would be interested in the results, assuming it is indeed a
> regression.


This gets more interesting.

Kernel + world was at r338465

I was going to test the patch, but since I had updated the src tree to 
r338499 I built it first without your patch.

Now, at r338499, without the patch, it doesn't seem to hit the 
performance problem.

vfs.zfs.arc_max is still set to 2G

ARC display in top is around 1000M total, haven't seen go above about 
1200M, even if I stress it.
Received on Sat Sep 08 2018 - 09:56:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:18 UTC