On Sat, Sep 8, 2018 at 11:03 Cy Schubert <Cy.Schubert_at_cschubert.com> wrote: > In message <e5abddc5-17f0-bf5f-753b-1edbc9356385_at_alvermark.net>, Jakob > Alvermar > k writes: > > > > Total MFU MRU Anon Hdr L2Hdr > Other > > ZFS ARC 667M 186M 168M 13M 3825K 0K 295M > > > > rate hits misses total hits total > > misses > > arcstats : 99% 65636 605 167338494 > 9317074 > > arcstats.demand_data : 57% 431 321 13414675 > 2117714 > > arcstats.demand_metadata : 99% 65175 193 152969480 > 5344919 > > arcstats.prefetch_data : 0% 0 30 3292 401344 > > arcstats.prefetch_metadata: 32% 30 61 951047 1453097 > > zfetchstats : 9% 119 1077 612582 55041789 > > arcstats.l2 : 0% 0 0 0 0 > > vdev_cache_stats : 0% 0 0 0 0 > > > > > > > > > > This is while a 'make -j8 buildworld' (it has 8 cores) is going. > > Overall you have a 94% hit ratio. > > slippy$ bc > scale=4 > 167338494/(167338494+9317074) > .9472 > slippy$ > > > It could be better. > > Why is your ZFS ARC so small? Before I answer this I will discuss my > experience first. > > My machines are seeing something similar to this: > > Total MFU MRU Anon Hdr L2Hdr > Other > ZFS ARC 4274M 2329M 1394M 17M 82M 0K > 445M > > rate hits misses total hits total > misses > arcstats : 97% 614 13 866509066 > 51853442 > arcstats.demand_data :100% 96 0 107658733 > 3101522 > arcstats.demand_metadata : 97% 516 13 755890353 > 48080146 > arcstats.prefetch_data : 0% 0 0 327613 > 225688 > arcstats.prefetch_metadata:100% 2 0 2632367 > 446086 > zfetchstats : 6% 6 80 2362709 > 294731645 > arcstats.l2 : 0% 0 0 0 > 0 > vdev_cache_stats : 0% 0 0 0 > 0 > > This is what you should see. This is with -CURRENT built two days ago. > > cwsys$ uname -a > FreeBSD cwsys 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #51 r338520M: Thu Sep 6 > 17:44:35 PDT 2018 root_at_cwsys:/export/obj/opt/src/svn-current/amd64.a > md64/sys/BREAK amd64 > cwsys$ > > Top reports: > > CPU: 0.3% user, 89.9% nice, 9.5% system, 0.3% interrupt, 0.0% idle > Mem: 678M Active, 344M Inact, 175M Laundry, 6136M Wired, 168M Buf, 598M > Free > ARC: 4247M Total, 2309M MFU, 1386M MRU, 21M Anon, 86M Header, 446M Other > 3079M Compressed, 5123M Uncompressed, 1.66:1 Ratio > Swap: 20G Total, 11M Used, 20G Free > > This is healthy. It's running a poudriere build. > > My laptop: > > Total MFU MRU Anon Hdr L2Hdr > Other > ZFS ARC 3175M 1791M 872M 69M 165M 0K > 277M > > rate hits misses total hits total > misses > arcstats : 99% 3851 26 89082984 > 5101207 > arcstats.demand_data : 99% 345 2 6197930 > 340186 > arcstats.demand_metadata : 99% 3506 24 81391265 > 4367755 > arcstats.prefetch_data : 0% 0 0 11507 > 30945 > arcstats.prefetch_metadata: 0% 0 0 1482282 > 362321 > zfetchstats : 2% 12 576 113185 > 38564546 > arcstats.l2 : 0% 0 0 0 > 0 > vdev_cache_stats : 0% 0 0 0 > 0 > > Similar results after working on a bunch of ports in four VMs last > night, testing various combinations of options while Heimdal in base is > private, hence the large ARC remaining this morning. > > Currently on the laptop top reports: > > CPU: 0.2% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.8% idle > Mem: 376M Active, 1214M Inact, 5907M Wired, 464M Buf, 259M Free > ARC: 3175M Total, 1863M MFU, 803M MRU, 69M Anon, 160M Header, 280M Other > 2330M Compressed, 7881M Uncompressed, 3.38:1 Ratio > Swap: 22G Total, 22G Free > > This is also healthy. > > Now for questions: > > Do you have any UFS filesystems? Top will report buf. What is that at? > > Some background: My /, /usr, and /var are UFS (these are old > installations which when I install a new machine I dump | rsh > new-machine restore, change a couple of entries in rc.conf and fstab, > rsync ports (/usr/local, /var/db...) and boot (I'm terribly impatient). > Hence the legacy. > > I have noticed that when writing a lot to UFS, increasing the size of > the UFS buffer cache, my ARC will reduce to 1 GB or even less. But this > is during a -j8 installworld to /, a test partition, an i386 partition > and a number of VMs on UFS on a zpool and other VMs using ZFS on the > same zpool. My ARC drops rapidly when the UFS filesystems are actively > being written to. UFS and ZFS on the same server will impact > performance unless one or the other is sparsely used. > > To repeat, do you have any UFS on the system? Do you write to UFS? Is > it actively being written to at the time? How many MB is used by UFS > buffers? > > How much RAM is installed on this machine? > > What is the scan rate? > > > > > SSH'ing to the machine while the buildworld is going it takes 40-60 > > seconds to get to the shell! > > Then your iostat or systat -v should show that you're hammering your > disks. Or, you may be using a lot of swap. > > > > > Hitting ^T while waiting: load: 1.06 cmd: zsh 45334 > > [arc_reclaim_waiters_cv] 56.11r 0.00u 0.10s 0% 5232k > > Load might be low because processes are waiting for disk I/O. Processes > waiting on I/O are not in the run queue and therefore don't affect load > average. Disk I/O will kill performance worse than CPU load. Back in > the days when I was an MVS systems programmer (IBM mainframe), I did a > fair bit of tuning MVS at the time (Z/OS today). The rule of thumb then > was machine instructions took nanoseconds whereas disk I/O took > milliseconds and interrupting a process to gain control of the CPU the > scheduler took nanoseconds because that's how long instructions took. > You cannot interrupt I/O. You have to wait for the current I/O > operation to complete before inserting a new I/O into the queue and > with tagged queuing you have to wait for the disk to complete its work > before scheduling new work. Now you're waiting multiples of > milliseconds instead of a few nanoseconds. I/O kills performance. > > Look at iostat or systat -v. I think your answer lies there and since > your ARC is small we need to find out why. > > > > > I will test the patch below and report back. > > Agreed, though IMO your workload and your environment need to be > understood first. What concerns me about the patch is what impact will > it have on other workloads. Not evicting data and only metadata could > impact buildworld -DNO_CLEAN for example. I do a -DNO_CLEAN > buildworlds, sometimes -DWORLDFAST. Adjusting vfs.zfs.arc_meta_limit to > the same value as vfs.zfs.arc_max improved my buildworld/installworld > performance. In addition disabling atime for the ZFS dataset containing > /usr/obj also improved buildworld/installworld performance by reducing > unnecessary (IMO) metadata writes. I think evicting metadata only might > cause a new set of problems for different workloads. (Maybe this should > be a sysctl?) > Mark's suggested change would just restore the default behavior from before balanced pruning was imported. Balanced pruning is dependent on code that didn't exist in FreeBSD - unfortunately when I did the import I did not pull in all the dependencies. Mid freeze, the least disruptive path is to disable the new behavior. Post branch we can restore it as a default, possibly contingent on the amount of memory in the system. There is precedent for this with prefetch. -M > > -- > Cheers, > Cy Schubert <Cy.Schubert_at_cschubert.com> > FreeBSD UNIX: <cy_at_FreeBSD.org> Web: http://www.FreeBSD.org > > The need of the many outweighs the greed of the few. > > > > > > > > Jakob > > > > On 9/7/18 7:27 PM, Cy Schubert wrote: > > > I'd be interested in seeing systat -z output. > > > > > > --- > > > Sent using a tiny phone keyboard. > > > Apologies for any typos and autocorrect. > > > Also, this old phone only supports top post. Apologies. > > > > > > Cy Schubert > > > <Cy.Schubert_at_cschubert.com> or <cy_at_freebsd.org> > > > The need of the many outweighs the greed of the few. > > > --- > > > > ------------------------------------------------------------------------ > > > From: Mark Johnston > > > Sent: 07/09/2018 09:09 > > > To: Jakob Alvermark > > > Cc: Subbsd; allanjude_at_freebsd.org; freebsd-current Current > > > Subject: Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4 > > > > > > On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote: > > > > On 9/6/18 2:28 AM, Mark Johnston wrote: > > > > > On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote: > > > > >> On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude_at_freebsd.org> > > > > wrote: > > > > >>> On 2018-09-05 10:04, Subbsd wrote: > > > > >>>> Hi, > > > > >>>> > > > > >>>> I'm seeing a huge loss in performance ZFS after upgrading > > > FreeBSD 12 > > > > >>>> to latest revision (r338466 the moment) and related to ARC. > > > > >>>> > > > > >>>> I can not say which revision was before except that the > newver.sh > > > > >>>> pointed to ALPHA3. > > > > >>>> > > > > >>>> Problems are observed if you try to limit ARC. In my case: > > > > >>>> > > > > >>>> vfs.zfs.arc_max="128M" > > > > >>>> > > > > >>>> I know that this is very small. However, for two years with > > > this there > > > > >>>> were no problems. > > > > >>>> > > > > >>>> When i send SIGINFO to process which is currently working with > > > ZFS, i > > > > >>>> see "arc_reclaim_waiters_cv": > > > > >>>> > > > > >>>> e.g when i type: > > > > >>>> > > > > >>>> /bin/csh > > > > >>>> > > > > >>>> I have time (~5 seconds) to press several times 'ctrl+t' before > > > csh is executed: > > > > >>>> > > > > >>>> load: 0.70 cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u > > > 0.00s 0% 3512k > > > > >>>> load: 0.70 cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% > 3512k > > > > >>>> load: 0.70 cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u > > > 0.01s 0% 3512k > > > > >>>> load: 0.73 cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u > > > 0.01s 0% 4156k > > > > >>>> > > > > >>>> same story with find or any other commans: > > > > >>>> > > > > >>>> load: 0.34 cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% > 2676k > > > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u > > > 0.00s 0% 2676k > > > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u > > > 0.00s 0% 2680k > > > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u > > > 0.00s 0% 2684k > > > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u > > > 0.00s 0% 2704k > > > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u > > > 0.00s 0% 2716k > > > > >>>> load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u > > > 0.00s 0% 2760k > > > > >>>> > > > > >>>> this problem goes away after increasing vfs.zfs.arc_max > > > > >>>> _______________________________________________ > > > > >>>> freebsd-current_at_freebsd.org mailing list > > > > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current > > > > >>>> To unsubscribe, send any mail to > > > "freebsd-current-unsubscribe_at_freebsd.org" > > > > >>>> > > > > >>> Previously, ZFS was not actually able to evict enough dnodes to > keep > > > > >>> your arc_max under 128MB, it would have been much higher based > > > on the > > > > >>> number of open files you had. A recent improvement from upstream > ZFS > > > > >>> (r337653 and r337660) was pulled in that fixed this, so setting > an > > > > >>> arc_max of 128MB is much more effective now, and that is causing > the > > > > >>> side effect of "actually doing what you asked it to do", in this > > > case, > > > > >>> what you are asking is a bit silly. If you have a working set > > > that is > > > > >>> greater than 128MB, and you ask ZFS to use less than that, it'll > > > have to > > > > >>> constantly try to reclaim memory to keep under that very low bar. > > > > >>> > > > > >> Thanks for comments. Mark was right when he pointed to r338416 ( > > > > >> > > > > https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/commo > > n/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416 > > > > >> ). Commenting aggsum_value returns normal speed regardless of the > > > rest > > > > >> of the new code from upstream. > > > > >> I would like to repeat that the speed with these two lines is not > > > just > > > > >> slow, but _INCREDIBLY_ slow! Probably, this should be written in > the > > > > >> relevant documentation for FreeBSD 12+ > > > > > > > > Hi, > > > > > > > > I am experiencing the same slowness when there is a bit of load on > the > > > > system (buildworld for example) which I haven't seen before. > > > > > > Is it a regression following a recent kernel update? > > > > > > > I have vfs.zfs.arc_max=2G. > > > > > > > > Top is reporting > > > > > > > > ARC: 607M Total, 140M MFU, 245M MRU, 1060K Anon, 4592K Header, 217M > > > Other > > > > 105M Compressed, 281M Uncompressed, 2.67:1 Ratio > > > > > > > > Should I test the patch? > > > > > > I would be interested in the results, assuming it is indeed a > > > regression. > > > _______________________________________________ > > > freebsd-current_at_freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > > > To unsubscribe, send any mail to " > freebsd-current-unsubscribe_at_freebsd.org" > > > > > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" >Received on Sat Sep 08 2018 - 17:49:43 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:18 UTC