Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

From: Subbsd <subbsd_at_gmail.com> Date: Fri, 7 Sep 2018 16:34:23 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:18 UTC

On Thu, Sep 6, 2018 at 3:28 AM Mark Johnston <markj_at_freebsd.org> wrote:
>
> On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:
> > On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude_at_freebsd.org> wrote:
> > >
> > > On 2018-09-05 10:04, Subbsd wrote:
> > > > Hi,
> > > >
> > > > I'm seeing a huge loss in performance ZFS after upgrading FreeBSD 12
> > > > to latest revision (r338466 the moment) and related to ARC.
> > > >
> > > > I can not say which revision was before except that the newver.sh
> > > > pointed to ALPHA3.
> > > >
> > > > Problems are observed if you try to limit ARC. In my case:
> > > >
> > > > vfs.zfs.arc_max="128M"
> > > >
> > > > I know that this is very small. However, for two years with this there
> > > > were no problems.
> > > >
> > > > When i send SIGINFO to process which is currently working with ZFS, i
> > > > see "arc_reclaim_waiters_cv":
> > > >
> > > > e.g when i type:
> > > >
> > > > /bin/csh
> > > >
> > > > I have time (~5 seconds) to press several times 'ctrl+t' before csh is executed:
> > > >
> > > > load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u 0.00s 0% 3512k
> > > > load: 0.70  cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k
> > > > load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u 0.01s 0% 3512k
> > > > load: 0.73  cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u 0.01s 0% 4156k
> > > >
> > > > same story with find or any other commans:
> > > >
> > > > load: 0.34  cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k
> > > > load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u 0.00s 0% 2676k
> > > > load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u 0.00s 0% 2680k
> > > > load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u 0.00s 0% 2684k
> > > > load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u 0.00s 0% 2704k
> > > > load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u 0.00s 0% 2716k
> > > > load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u 0.00s 0% 2760k
> > > >
> > > > this problem goes away after increasing vfs.zfs.arc_max
> > > > _______________________________________________
> > > > freebsd-current_at_freebsd.org mailing list
> > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> > > >
> > >
> > > Previously, ZFS was not actually able to evict enough dnodes to keep
> > > your arc_max under 128MB, it would have been much higher based on the
> > > number of open files you had. A recent improvement from upstream ZFS
> > > (r337653 and r337660) was pulled in that fixed this, so setting an
> > > arc_max of 128MB is much more effective now, and that is causing the
> > > side effect of "actually doing what you asked it to do", in this case,
> > > what you are asking is a bit silly. If you have a working set that is
> > > greater than 128MB, and you ask ZFS to use less than that, it'll have to
> > > constantly try to reclaim memory to keep under that very low bar.
> > >
> >
> > Thanks for comments. Mark was right when he pointed to r338416 (
> > https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416
> > ). Commenting aggsum_value returns normal speed regardless of the rest
> > of the new code from upstream.
> > I would like to repeat that the speed with these two lines is not just
> > slow, but _INCREDIBLY_ slow! Probably, this should be written in the
> > relevant documentation for FreeBSD 12+
>
> Could you please retest with the patch below applied, instead of
> reverting r338416?
>
> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> index 2bc065e12509..9b039b7d4a96 100644
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> _at__at_ -538,7 +538,7 _at__at_ typedef struct arc_state {
>   */
>  int zfs_arc_meta_prune = 10000;
>  unsigned long zfs_arc_dnode_limit_percent = 10;
> -int zfs_arc_meta_strategy = ARC_STRATEGY_META_BALANCED;
> +int zfs_arc_meta_strategy = ARC_STRATEGY_META_ONLY;
>  int zfs_arc_meta_adjust_restarts = 4096;
>
>  /* The 6 states: */

Unfortunately, I can not conduct sufficiently serious regression tests
right now (e.g through benchmark tools, with statistics via PMC(3),
etc.). However, I can do a very surface comparison - for example with
find.
tested revision: 338520
loader.conf settings: vfs.zfs.arc_max="128M"

meta strategy ARC_STRATEGY_META_BALANCED result:

% time find /usr/ports -type f > /dev/null
0.495u 6.912s 7:20.88 1.6%      34+172k 289594+0io 0pf+0w

meta strategy ARC_STRATEGY_META_ONLY:
% time find /usr/ports -type f > /dev/null
0.721u 7.184s 4:57.18 2.6%      34+171k 300910+0io 0pf+0w

So the difference in ~183 seconds.
Both test started after a full boot OS cycle.

Interestingly, despite the limit, ARC size by top(1) was constantly
increasing and at the time of the end of the test it looked like this:

ARC: 1161M Total, 392M MFU, 118M MRU, 32K Anon, 7959K Header, 643M Other
     101M Compressed, 411M Uncompressed, 4.07:1 Ratio
Swap: 16G Total, 16G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 8413 root          1  20    0    11M  3024K arc_re   4   0:07   1.49% find