Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Fri, 6 Apr 2018 10:33:19 -0700 (PDT)
On  4 Apr, Don Lewis wrote:
> On  4 Apr, Mark Johnston wrote:
>> On Tue, Apr 03, 2018 at 09:42:48PM -0700, Don Lewis wrote:
>>> On  3 Apr, Don Lewis wrote:
>>> > I reconfigured my Ryzen box to be more similar to my default package
>>> > builder by disabling SMT and half of the RAM, to limit it to 8 cores
>>> > and 32 GB and then started bisecting to try to track down the problem.
>>> > For each test, I first filled ARC by tarring /usr/ports/distfiles to
>>> > /dev/null.  The commit range that I was searching was r329844 to
>>> > r331716.  I narrowed the range to r329844 to r329904.  With r329904
>>> > and newer, ARC is totally unresponsive to memory pressure and the
>>> > machine pages heavily.  I see ARC sizes of 28-29GB and 30GB of wired
>>> > RAM, so there is not much leftover for getting useful work done.  Active
>>> > memory and free memory both hover under 1GB each.  Looking at the
>>> > commit logs over this range, the most likely culprit is:
>>> > 
>>> > r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 lines
>>> > 
>>> > Add a generic Proportional Integral Derivative (PID) controller algorithm and
>>> > use it to regulate page daemon output.
>>> > 
>>> > This provides much smoother and more responsive page daemon output, anticipating
>>> > demand and avoiding pageout stalls by increasing the number of pages to match
>>> > the workload.  This is a reimplementation of work done by myself and mlaier at
>>> > Isilon.
>>> > 
>>> > 
>>> > It is quite possible that the recent fixes to the PID controller will
>>> > fix the problem.  Not that r329844 was trouble free ... I left tar
>>> > running over lunchtime to fill ARC and the OOM killer nuked top, tar,
>>> > ntpd, both of my ssh sessions into the machine, and multiple instances
>>> > of getty while I was away.  I was able to log in again and successfully
>>> > run poudriere, and ARC did respond to the memory pressure and cranked
>>> > itself down to about 5 GB by the end of the run.  I did not see the same
>>> > problem with tar when I did the same with r329904.
>>> 
>>> I just tried r331966 and see no improvement.  No OOM process kills
>>> during the tar run to fill ARC, but with ARC filled, the machine is
>>> thrashing itself at the start of the poudriere run while trying to build
>>> ports-mgmt/pkg (39 minutes so far).  ARC appears to be unresponsive to
>>> memory demand.  I've seen no decrease in ARC size or wired memory since
>>> starting poudriere.
>> 
>> Re-reading the ARC reclaim code, I see a couple of issues which might be
>> at the root of the behaviour you're seeing.
>> 
>> 1. zfs_arc_free_target is too low now. It is initialized to the page
>>    daemon wakeup threshold, which is slightly above v_free_min. With the
>>    PID controller, the page daemon uses a setpoint of v_free_target.
>>    Moreover, it now wakes up regularly rather than having wakeups be
>>    synchronized by a mutex, so it will respond quickly if the free page
>>    count dips below v_free_target. The free page count will dip below
>>    zfs_arc_free_target only in the face of sudden and extreme memory
>>    pressure now, so the FMT_LOTSFREE case probably isn't getting
>>    exercised. Try initializing zfs_arc_free_target to v_free_target.
> 
> Changing zfs_arc_free_target definitely helps.  My previous poudriere
> run failed when poudriere timed out the ports-mgmt/pkg build after two
> hours.  After changing this setting, poudriere seems to be running
> properly and ARC has dropped from 29GB to 26GB ten minutes into the run
> and I'm not seeing processes in the swread state.
> 
>> 2. In the inactive queue scan, we used to compute the shortage after
>>    running uma_reclaim() and the lowmem handlers (which includes a
>>    synchronous call to arc_lowmem()). Now it's computed before, so we're
>>    not taking into account the pages that get freed by the ARC and UMA.
>>    The following rather hacky patch may help. I note that the lowmem
>>    logic is now somewhat broken when multiple NUMA domains are
>>    configured, however, since it fires only when domain 0 has a free
>>    page shortage.
> 
> I will try this next.

The patch by itself is not sufficient to fix the problem for me.  I
didn't have any problems with using the patch as well as setting
zfs_arc_free_target.  As a matter of fact, that was the only poudriere
run where I didn't have a guile-related build failure.  Those tend to
be fairly random, so it could just be luck.

Performance-wise r331966 + zfs_arc_free_target completes the poudriere
run about 2.6% faster than r329844.  But I don't know if this is the PID
controller or something else that changed in base over that interval.
Received on Fri Apr 06 2018 - 15:33:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC