Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Tue, 3 Apr 2018 21:42:48 -0700 (PDT)
On  3 Apr, Don Lewis wrote:
> On  1 Apr, Don Lewis wrote:
>> On 27 Mar, Andriy Gapon wrote:
>>> On 24/03/2018 01:21, Bryan Drewery wrote:
>>>> On 3/20/2018 12:07 AM, Peter Jeremy wrote:
>>>>>
>>>>> On 2018-Mar-11 10:43:58 -1000, Jeff Roberson <jroberson_at_jroberson.net> wrote:
>>>>>> Also, if you could try going back to r328953 or r326346 and let me know if 
>>>>>> the problem exists in either.  That would be very helpful.  If anyone is 
>>>>>> willing to debug this with me contact me directly and I will send some 
>>>>>> test patches or debugging info after you have done the above steps.
>>>>>
>>>>> I ran into this on 11-stable and tracked it to r326619 (MFC of r325851).
>>>>> I initially got around the problem by reverting that commit but either
>>>>> it or something very similar is still present in 11-stable r331053.
>>>>>
>>>>> I've seen it in my main server (32GB RAM) but haven't managed to reproduce
>>>>> it in smaller VBox guests - one difficulty I faced was artificially filling
>>>>> ARC.
>>> 
>>> First, it looks like maybe several different issues are being discussed and
>>> possibly conflated in this thread.  I see reports related to ZFS, I see reports
>>> where ZFS is not used at all.  Some people report problems that appeared very
>>> recently while others chime in with "yes, yes, I've always had this problem".
>>> This does not help to differentiate between problems and to analyze them.
>>> 
>>>> Looking at the ARC change you referred to from r325851
>>>> https://reviews.freebsd.org/D12163, I am convinced that ARC backpressure
>>>> is completely broken.
>>> 
>>> Does your being convinced come from the code review or from experiments?
>>> If the former, could you please share your analysis?
>>> 
>>>> On my 78GB RAM system with ARC limited to 40GB and
>>>> doing a poudriere build of all LLVM and GCC packages at once in tmpfs I
>>>> can get swap up near 50GB and yet the ARC remains at 40GB through it
>>>> all.  It's always been slow to give up memory for package builds but it
>>>> really seems broken right now.
>>> 
>>> Well, there are multiple angles.  Maybe it's ARC that does not react properly,
>>> or maybe it's not being asked properly.
>>> 
>>> It would be useful to monitor the system during its transition to the state that
>>> you reported.  There are some interesting DTrace probes in ARC, specifically
>>> arc-available_memory and arc-needfree are first that come to mind.  Their
>>> parameters and how frequently they are called are of interest.  VM parameters
>>> could be of interest as well.
>>> 
>>> A rant.
>>> 
>>> Basically, posting some numbers and jumping to conclusions does not help at all.
>>> Monitoring, graphing, etc does help.
>>> ARC is a complex dynamic system.
>>> VM (pagedaemon, UMA caches) is a complex dynamic system.
>>> They interact in a complex dynamic ways.
>>> Sometimes a change in ARC is incorrect and requires a fix.
>>> Sometimes a change in VM is incorrect and requires a fix.
>>> Sometimes a change in VM requires a change in ARC.
>>> These three kinds of problems can all appear as a "problem with ARC".
>>> 
>>> For instance, when vm.lowmem_period was introduced you wouldn't find any mention
>>> of ZFS/ARC.  But it does affect period between arc_lowmem() calls.
>>> 
>>> Also, pin-pointing a specific commit requires proper bisecting and proper
>>> testing to correctly attribute systemic behavior changes to code changes.
>> 
>> I just upgraded my main package build box (12.0-CURRENT, 8 cores, 32 GB
>> RAM) from r327616 to r331716.  I was seeing higher swap usage and larger
>> ARC sizes before the upgrade than I remember from the distant past, but
>> ARC was still at least somewhat responsive to memory pressure and I
>> didn't notice any performance issues.
>> 
>> After the upgrade, ARC size seems to be pretty unresponsive to memory
>> demand.  Currently the machine is near the end of a poudriere run to
>> build my usual set of ~1800 ports.  The only currently running build is
>> chromium and the machine is paging heavily.  Settings of interest are:
>>   USE_TMPFS="wrkdir data localbase"
>>   ALLOW_MAKE_JOBS=yes
>> 
>> last pid: 96239;  load averages:  1.86,  1.76,  1.83    up 3+14:47:00  12:38:11
>> 108 processes: 3 running, 105 sleeping
>> CPU: 18.6% user,  0.0% nice,  2.4% system,  0.0% interrupt, 79.0% idle
>> Mem: 129M Active, 865M Inact, 61M Laundry, 29G Wired, 1553K Buf, 888M Free
>> ARC: 23G Total, 8466M MFU, 10G MRU, 5728K Anon, 611M Header, 3886M Other
>>      17G Compressed, 32G Uncompressed, 1.88:1 Ratio
>> Swap: 40G Total, 17G Used, 23G Free, 42% Inuse, 4756K In
>> 
>>   PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAN
>> 96239 nobody           1  76    0   140M 93636K CPU5    5   0:01  82.90% clang-
>> 96238 nobody           1  75    0   140M 92608K CPU7    7   0:01  80.81% clang-
>>  5148 nobody           1  20    0   590M   113M swread  0   0:31   0.29% clang-
>> 57290 root             1  20    0 12128K  2608K zio->i  7   8:11   0.28% find
>> 78958 nobody           1  20    0   838M   299M swread  0   0:23   0.19% clang-
>> 97840 nobody           1  20    0   698M   140M swread  4   0:27   0.13% clang-
>> 96066 nobody           1  20    0   463M   104M swread  1   0:32   0.12% clang-
>> 11050 nobody           1  20    0   892M   154M swread  2   0:39   0.12% clang-
>> 
>> Pre-upgrade I was running r327616, which is newer than either of the
>> commits that Jeff mentioned above.  It seems like there has been a
>> regression since then.
>> 
>> I also don't recall seeing this problem on my Ryzen box, though it has
>> 2x the core count and 2x the RAM.  The last testing that I did on it was
>> with r329844.
> 
> I reconfigured my Ryzen box to be more similar to my default package
> builder by disabling SMT and half of the RAM, to limit it to 8 cores
> and 32 GB and then started bisecting to try to track down the problem.
> For each test, I first filled ARC by tarring /usr/ports/distfiles to
> /dev/null.  The commit range that I was searching was r329844 to
> r331716.  I narrowed the range to r329844 to r329904.  With r329904
> and newer, ARC is totally unresponsive to memory pressure and the
> machine pages heavily.  I see ARC sizes of 28-29GB and 30GB of wired
> RAM, so there is not much leftover for getting useful work done.  Active
> memory and free memory both hover under 1GB each.  Looking at the
> commit logs over this range, the most likely culprit is:
> 
> r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 lines
> 
> Add a generic Proportional Integral Derivative (PID) controller algorithm and
> use it to regulate page daemon output.
> 
> This provides much smoother and more responsive page daemon output, anticipating
> demand and avoiding pageout stalls by increasing the number of pages to match
> the workload.  This is a reimplementation of work done by myself and mlaier at
> Isilon.
> 
> 
> It is quite possible that the recent fixes to the PID controller will
> fix the problem.  Not that r329844 was trouble free ... I left tar
> running over lunchtime to fill ARC and the OOM killer nuked top, tar,
> ntpd, both of my ssh sessions into the machine, and multiple instances
> of getty while I was away.  I was able to log in again and successfully
> run poudriere, and ARC did respond to the memory pressure and cranked
> itself down to about 5 GB by the end of the run.  I did not see the same
> problem with tar when I did the same with r329904.

I just tried r331966 and see no improvement.  No OOM process kills
during the tar run to fill ARC, but with ARC filled, the machine is
thrashing itself at the start of the poudriere run while trying to build
ports-mgmt/pkg (39 minutes so far).  ARC appears to be unresponsive to
memory demand.  I've seen no decrease in ARC size or wired memory since
starting poudriere.

last pid: 75652;  load averages:  0.46,  0.27,  0.25    up 0+02:03:48  21:33:00
48 processes:  1 running, 47 sleeping
CPU:  0.7% user,  0.0% nice,  0.8% system,  0.1% interrupt, 98.4% idle
Mem: 4196K Active, 328K Inact, 148K Laundry, 30G Wired, 506M Free
ARC: 29G Total, 309M MFU, 28G MRU, 3767K Anon, 103M Header, 79M Other
     27G Compressed, 29G Uncompressed, 1.05:1 Ratio
Swap: 80G Total, 234M Used, 80G Free, 2348K In, 420K Out

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 1340 root          1  52    0 13048K  1024K nanslp  6   0:10   0.72% sh
41806 nobody        1  20    0   133M  1440K swread  6   0:17   0.31% cc
42554 nobody        1  20    0   129M  1048K swread  6   0:12   0.22% cc
  934 dl            1  20    0 19124K   592K select  6   0:03   0.14% sshd
  938 dl            1  20    0 13596K   740K CPU7    7   0:07   0.12% top
40784 nobody        1  20    0  5024K   104K select  5   0:00   0.03% make
41142 nobody        1  20    0  5024K   124K select  6   0:00   0.01% make
  638 root          1  20    0 11344K   252K swread  7   0:00   0.01% syslogd
41428 nobody        1  20    0  5024K   124K select  4   0:00   0.01% make
40782 nobody        1  20    0  5024K   124K select  4   0:00   0.01% make
41164 nobody        1  20    0  5024K   124K select  7   0:00   0.01% make
 1050 root          1  23    0 13032K   468K select  0   0:06   0.00% sh
  848 root          1  20    0 11400K   492K nanslp  5   0:00   0.00% cron
36198 root          1  20    0 13048K   396K select  7   0:00   0.00% sh
Received on Wed Apr 04 2018 - 02:43:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC