Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

From: Matthew Macy <mat.macy_at_gmail.com>
Date: Sat, 8 Sep 2018 12:49:31 -0700
On Sat, Sep 8, 2018 at 11:03 Cy Schubert <Cy.Schubert_at_cschubert.com> wrote:

> In message <e5abddc5-17f0-bf5f-753b-1edbc9356385_at_alvermark.net>, Jakob
> Alvermar
> k writes:
> >
> >                         Total     MFU     MRU    Anon     Hdr L2Hdr
> Other
> >       ZFS ARC            667M    186M    168M     13M   3825K 0K    295M
> >
> >                                  rate    hits  misses   total hits total
> > misses
> >       arcstats                  : 99%   65636     605 167338494
> 9317074
> >       arcstats.demand_data      : 57%     431     321 13414675
> 2117714
> >       arcstats.demand_metadata  : 99%   65175     193 152969480
> 5344919
> >       arcstats.prefetch_data    :  0%       0      30 3292       401344
> >       arcstats.prefetch_metadata: 32%      30      61 951047      1453097
> >       zfetchstats               :  9%     119    1077 612582     55041789
> >       arcstats.l2               :  0%       0       0 0            0
> >       vdev_cache_stats          :  0%       0       0 0            0
> >
> >
> >
> >
> > This is while a 'make -j8 buildworld' (it has 8 cores) is going.
>
> Overall you have a 94% hit ratio.
>
> slippy$ bc
> scale=4
> 167338494/(167338494+9317074)
> .9472
> slippy$
>
>
> It could be better.
>
> Why is your ZFS ARC so small? Before I answer this I will discuss my
> experience first.
>
> My machines are seeing something similar to this:
>
>                       Total     MFU     MRU    Anon     Hdr   L2Hdr
> Other
>      ZFS ARC           4274M   2329M   1394M     17M     82M      0K
> 445M
>
>                                 rate    hits  misses   total hits total
> misses
>      arcstats                  : 97%     614      13    866509066
> 51853442
>      arcstats.demand_data      :100%      96       0    107658733
> 3101522
>      arcstats.demand_metadata  : 97%     516      13    755890353
> 48080146
>      arcstats.prefetch_data    :  0%       0       0       327613
> 225688
>      arcstats.prefetch_metadata:100%       2       0      2632367
> 446086
>      zfetchstats               :  6%       6      80      2362709
> 294731645
>      arcstats.l2               :  0%       0       0            0
>      0
>      vdev_cache_stats          :  0%       0       0            0
>      0
>
> This is what you should see. This is with -CURRENT built two days ago.
>
> cwsys$ uname -a
> FreeBSD cwsys 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #51 r338520M: Thu Sep  6
> 17:44:35 PDT 2018     root_at_cwsys:/export/obj/opt/src/svn-current/amd64.a
> md64/sys/BREAK  amd64
> cwsys$
>
> Top reports:
>
> CPU:  0.3% user, 89.9% nice,  9.5% system,  0.3% interrupt,  0.0% idle
> Mem: 678M Active, 344M Inact, 175M Laundry, 6136M Wired, 168M Buf, 598M
> Free
> ARC: 4247M Total, 2309M MFU, 1386M MRU, 21M Anon, 86M Header, 446M Other
>      3079M Compressed, 5123M Uncompressed, 1.66:1 Ratio
> Swap: 20G Total, 11M Used, 20G Free
>
> This is healthy. It's running a poudriere build.
>
> My laptop:
>
>                        Total     MFU     MRU    Anon     Hdr   L2Hdr
> Other
>      ZFS ARC           3175M   1791M    872M     69M    165M      0K
> 277M
>
>                                 rate    hits  misses   total hits total
> misses
>      arcstats                  : 99%    3851      26     89082984
> 5101207
>      arcstats.demand_data      : 99%     345       2      6197930
> 340186
>      arcstats.demand_metadata  : 99%    3506      24     81391265
> 4367755
>      arcstats.prefetch_data    :  0%       0       0        11507
>  30945
>      arcstats.prefetch_metadata:  0%       0       0      1482282
> 362321
>      zfetchstats               :  2%      12     576       113185
> 38564546
>      arcstats.l2               :  0%       0       0            0
>      0
>      vdev_cache_stats          :  0%       0       0            0
>      0
>
> Similar results after working on a bunch of ports in four VMs last
> night, testing various combinations of options while Heimdal in base is
> private, hence the large ARC remaining this morning.
>
> Currently on the laptop top reports:
>
> CPU:  0.2% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.8% idle
> Mem: 376M Active, 1214M Inact, 5907M Wired, 464M Buf, 259M Free
> ARC: 3175M Total, 1863M MFU, 803M MRU, 69M Anon, 160M Header, 280M Other
>      2330M Compressed, 7881M Uncompressed, 3.38:1 Ratio
> Swap: 22G Total, 22G Free
>
> This is also healthy.
>
> Now for questions:
>
> Do you have any UFS filesystems? Top will report buf. What is that at?
>
> Some background: My /, /usr, and /var are UFS (these are old
> installations which when I install a new machine I dump | rsh
> new-machine restore, change a couple of entries in rc.conf and fstab,
> rsync ports (/usr/local, /var/db...) and boot (I'm terribly impatient).
> Hence the legacy.
>
> I have noticed that when writing a lot to UFS, increasing the size of
> the UFS buffer cache, my ARC will reduce to 1 GB or even less. But this
> is during a -j8 installworld to /, a test partition, an i386 partition
> and a number of VMs on UFS on a zpool and other VMs using ZFS on the
> same zpool. My ARC drops rapidly when the UFS filesystems are actively
> being written to. UFS and ZFS on the same server will impact
> performance unless one or the other is sparsely used.
>
> To repeat, do you have any UFS on the system? Do you write to UFS? Is
> it actively being written to at the time? How many MB is used by UFS
> buffers?
>
> How much RAM is installed on this machine?
>
> What is the scan rate?
>
> >
> > SSH'ing to the machine while the buildworld is going it takes 40-60
> > seconds to get to the shell!
>
> Then your iostat or systat -v should show that you're hammering your
> disks. Or, you may be using a lot of swap.
>
> >
> > Hitting ^T while waiting: load: 1.06  cmd: zsh 45334
> > [arc_reclaim_waiters_cv] 56.11r 0.00u 0.10s 0% 5232k
>
> Load might be low because processes are waiting for disk I/O. Processes
> waiting on I/O are not in the run queue and therefore don't affect load
> average. Disk I/O will kill performance worse than CPU load. Back in
> the days when I was an MVS systems programmer (IBM mainframe), I did a
> fair bit of tuning MVS at the time (Z/OS today). The rule of thumb then
> was machine instructions took nanoseconds whereas disk I/O took
> milliseconds and interrupting a process to gain control of the CPU the
> scheduler took nanoseconds because that's how long instructions took.
> You cannot interrupt I/O. You have to wait for the current I/O
> operation to complete before inserting a new I/O into the queue and
> with tagged queuing you have to wait for the disk to complete its work
> before scheduling new work. Now you're waiting multiples of
> milliseconds instead of a few nanoseconds. I/O kills performance.
>
> Look at iostat or systat -v. I think your answer lies there and since
> your ARC is small we need to find out why.
>
> >
> > I will test the patch below and report back.
>
> Agreed, though IMO your workload and your environment need to be
> understood first. What concerns me about the patch is what impact will
> it have on other workloads. Not evicting data and only metadata could
> impact buildworld -DNO_CLEAN for example. I do a -DNO_CLEAN
> buildworlds, sometimes -DWORLDFAST. Adjusting vfs.zfs.arc_meta_limit to
> the same value as vfs.zfs.arc_max improved my buildworld/installworld
> performance. In addition disabling atime for the ZFS dataset containing
> /usr/obj also improved buildworld/installworld performance by reducing
> unnecessary (IMO) metadata writes. I think evicting metadata only might
> cause a new set of problems for different workloads. (Maybe this should
> be a sysctl?)
>

Mark's suggested change would just restore the default behavior from before
balanced pruning was imported. Balanced pruning is dependent on code that
didn't exist in FreeBSD - unfortunately when I did the import I did not
pull in all the dependencies. Mid freeze, the least disruptive path is to
disable the new behavior. Post branch we can restore it as a default,
possibly contingent on the amount of memory in the system. There is
precedent for this with prefetch.

-M


>
> --
> Cheers,
> Cy Schubert <Cy.Schubert_at_cschubert.com>
> FreeBSD UNIX:  <cy_at_FreeBSD.org>   Web:  http://www.FreeBSD.org
>
>         The need of the many outweighs the greed of the few.
>
>
> >
> >
> > Jakob
> >
> > On 9/7/18 7:27 PM, Cy Schubert wrote:
> > > I'd be interested in seeing systat -z output.
> > >
> > > ---
> > > Sent using a tiny phone keyboard.
> > > Apologies for any typos and autocorrect.
> > > Also, this old phone only supports top post. Apologies.
> > >
> > > Cy Schubert
> > > <Cy.Schubert_at_cschubert.com> or <cy_at_freebsd.org>
> > > The need of the many outweighs the greed of the few.
> > > ---
> > >
> ------------------------------------------------------------------------
> > > From: Mark Johnston
> > > Sent: 07/09/2018 09:09
> > > To: Jakob Alvermark
> > > Cc: Subbsd; allanjude_at_freebsd.org; freebsd-current Current
> > > Subject: Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4
> > >
> > > On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote:
> > > > On 9/6/18 2:28 AM, Mark Johnston wrote:
> > > > > On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:
> > > > >> On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude_at_freebsd.org>
>
> > > wrote:
> > > > >>> On 2018-09-05 10:04, Subbsd wrote:
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> I'm seeing a huge loss in performance ZFS after upgrading
> > > FreeBSD 12
> > > > >>>> to latest revision (r338466 the moment) and related to ARC.
> > > > >>>>
> > > > >>>> I can not say which revision was before except that the
> newver.sh
> > > > >>>> pointed to ALPHA3.
> > > > >>>>
> > > > >>>> Problems are observed if you try to limit ARC. In my case:
> > > > >>>>
> > > > >>>> vfs.zfs.arc_max="128M"
> > > > >>>>
> > > > >>>> I know that this is very small. However, for two years with
> > > this there
> > > > >>>> were no problems.
> > > > >>>>
> > > > >>>> When i send SIGINFO to process which is currently working with
> > > ZFS, i
> > > > >>>> see "arc_reclaim_waiters_cv":
> > > > >>>>
> > > > >>>> e.g when i type:
> > > > >>>>
> > > > >>>> /bin/csh
> > > > >>>>
> > > > >>>> I have time (~5 seconds) to press several times 'ctrl+t' before
> > > csh is executed:
> > > > >>>>
> > > > >>>> load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u
> > > 0.00s 0% 3512k
> > > > >>>> load: 0.70  cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0%
> 3512k
> > > > >>>> load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u
> > > 0.01s 0% 3512k
> > > > >>>> load: 0.73  cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u
> > > 0.01s 0% 4156k
> > > > >>>>
> > > > >>>> same story with find or any other commans:
> > > > >>>>
> > > > >>>> load: 0.34  cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0%
> 2676k
> > > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u
> > > 0.00s 0% 2676k
> > > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u
> > > 0.00s 0% 2680k
> > > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u
> > > 0.00s 0% 2684k
> > > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u
> > > 0.00s 0% 2704k
> > > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u
> > > 0.00s 0% 2716k
> > > > >>>> load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u
> > > 0.00s 0% 2760k
> > > > >>>>
> > > > >>>> this problem goes away after increasing vfs.zfs.arc_max
> > > > >>>> _______________________________________________
> > > > >>>> freebsd-current_at_freebsd.org mailing list
> > > > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > > >>>> To unsubscribe, send any mail to
> > > "freebsd-current-unsubscribe_at_freebsd.org"
> > > > >>>>
> > > > >>> Previously, ZFS was not actually able to evict enough dnodes to
> keep
> > > > >>> your arc_max under 128MB, it would have been much higher based
> > > on the
> > > > >>> number of open files you had. A recent improvement from upstream
> ZFS
> > > > >>> (r337653 and r337660) was pulled in that fixed this, so setting
> an
> > > > >>> arc_max of 128MB is much more effective now, and that is causing
> the
> > > > >>> side effect of "actually doing what you asked it to do", in this
> > > case,
> > > > >>> what you are asking is a bit silly. If you have a working set
> > > that is
> > > > >>> greater than 128MB, and you ask ZFS to use less than that, it'll
> > > have to
> > > > >>> constantly try to reclaim memory to keep under that very low bar.
> > > > >>>
> > > > >> Thanks for comments. Mark was right when he pointed to r338416 (
> > > > >>
> > >
> https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/commo
> > n/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416
> > > > >> ). Commenting aggsum_value returns normal speed regardless of the
> > > rest
> > > > >> of the new code from upstream.
> > > > >> I would like to repeat that the speed with these two lines is not
> > > just
> > > > >> slow, but _INCREDIBLY_ slow! Probably, this should be written in
> the
> > > > >> relevant documentation for FreeBSD 12+
> > > >
> > > > Hi,
> > > >
> > > > I am experiencing the same slowness when there is a bit of load on
> the
> > > > system (buildworld for example) which I haven't seen before.
> > >
> > > Is it a regression following a recent kernel update?
> > >
> > > > I have vfs.zfs.arc_max=2G.
> > > >
> > > > Top is reporting
> > > >
> > > > ARC: 607M Total, 140M MFU, 245M MRU, 1060K Anon, 4592K Header, 217M
> > > Other
> > > >       105M Compressed, 281M Uncompressed, 2.67:1 Ratio
> > > >
> > > > Should I test the patch?
> > >
> > > I would be interested in the results, assuming it is indeed a
> > > regression.
> > > _______________________________________________
> > > freebsd-current_at_freebsd.org mailing list
> > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > > To unsubscribe, send any mail to "
> freebsd-current-unsubscribe_at_freebsd.org"
> > >
>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>
Received on Sat Sep 08 2018 - 17:49:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:18 UTC