Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Tue, 27 Mar 2018 17:00:09 +0300
On 24/03/2018 01:21, Bryan Drewery wrote:
> On 3/20/2018 12:07 AM, Peter Jeremy wrote:
>>
>> On 2018-Mar-11 10:43:58 -1000, Jeff Roberson <jroberson_at_jroberson.net> wrote:
>>> Also, if you could try going back to r328953 or r326346 and let me know if 
>>> the problem exists in either.  That would be very helpful.  If anyone is 
>>> willing to debug this with me contact me directly and I will send some 
>>> test patches or debugging info after you have done the above steps.
>>
>> I ran into this on 11-stable and tracked it to r326619 (MFC of r325851).
>> I initially got around the problem by reverting that commit but either
>> it or something very similar is still present in 11-stable r331053.
>>
>> I've seen it in my main server (32GB RAM) but haven't managed to reproduce
>> it in smaller VBox guests - one difficulty I faced was artificially filling
>> ARC.

First, it looks like maybe several different issues are being discussed and
possibly conflated in this thread.  I see reports related to ZFS, I see reports
where ZFS is not used at all.  Some people report problems that appeared very
recently while others chime in with "yes, yes, I've always had this problem".
This does not help to differentiate between problems and to analyze them.

> Looking at the ARC change you referred to from r325851
> https://reviews.freebsd.org/D12163, I am convinced that ARC backpressure
> is completely broken.

Does your being convinced come from the code review or from experiments?
If the former, could you please share your analysis?

> On my 78GB RAM system with ARC limited to 40GB and
> doing a poudriere build of all LLVM and GCC packages at once in tmpfs I
> can get swap up near 50GB and yet the ARC remains at 40GB through it
> all.  It's always been slow to give up memory for package builds but it
> really seems broken right now.

Well, there are multiple angles.  Maybe it's ARC that does not react properly,
or maybe it's not being asked properly.

It would be useful to monitor the system during its transition to the state that
you reported.  There are some interesting DTrace probes in ARC, specifically
arc-available_memory and arc-needfree are first that come to mind.  Their
parameters and how frequently they are called are of interest.  VM parameters
could be of interest as well.

A rant.

Basically, posting some numbers and jumping to conclusions does not help at all.
Monitoring, graphing, etc does help.
ARC is a complex dynamic system.
VM (pagedaemon, UMA caches) is a complex dynamic system.
They interact in a complex dynamic ways.
Sometimes a change in ARC is incorrect and requires a fix.
Sometimes a change in VM is incorrect and requires a fix.
Sometimes a change in VM requires a change in ARC.
These three kinds of problems can all appear as a "problem with ARC".

For instance, when vm.lowmem_period was introduced you wouldn't find any mention
of ZFS/ARC.  But it does affect period between arc_lowmem() calls.

Also, pin-pointing a specific commit requires proper bisecting and proper
testing to correctly attribute systemic behavior changes to code changes.


-- 
Andriy Gapon
Received on Tue Mar 27 2018 - 12:00:20 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC