On 27 Mar, Andriy Gapon wrote: > On 24/03/2018 01:21, Bryan Drewery wrote: >> On 3/20/2018 12:07 AM, Peter Jeremy wrote: >>> >>> On 2018-Mar-11 10:43:58 -1000, Jeff Roberson <jroberson_at_jroberson.net> wrote: >>>> Also, if you could try going back to r328953 or r326346 and let me know if >>>> the problem exists in either. That would be very helpful. If anyone is >>>> willing to debug this with me contact me directly and I will send some >>>> test patches or debugging info after you have done the above steps. >>> >>> I ran into this on 11-stable and tracked it to r326619 (MFC of r325851). >>> I initially got around the problem by reverting that commit but either >>> it or something very similar is still present in 11-stable r331053. >>> >>> I've seen it in my main server (32GB RAM) but haven't managed to reproduce >>> it in smaller VBox guests - one difficulty I faced was artificially filling >>> ARC. > > First, it looks like maybe several different issues are being discussed and > possibly conflated in this thread. I see reports related to ZFS, I see reports > where ZFS is not used at all. Some people report problems that appeared very > recently while others chime in with "yes, yes, I've always had this problem". > This does not help to differentiate between problems and to analyze them. > >> Looking at the ARC change you referred to from r325851 >> https://reviews.freebsd.org/D12163, I am convinced that ARC backpressure >> is completely broken. > > Does your being convinced come from the code review or from experiments? > If the former, could you please share your analysis? > >> On my 78GB RAM system with ARC limited to 40GB and >> doing a poudriere build of all LLVM and GCC packages at once in tmpfs I >> can get swap up near 50GB and yet the ARC remains at 40GB through it >> all. It's always been slow to give up memory for package builds but it >> really seems broken right now. > > Well, there are multiple angles. Maybe it's ARC that does not react properly, > or maybe it's not being asked properly. > > It would be useful to monitor the system during its transition to the state that > you reported. There are some interesting DTrace probes in ARC, specifically > arc-available_memory and arc-needfree are first that come to mind. Their > parameters and how frequently they are called are of interest. VM parameters > could be of interest as well. > > A rant. > > Basically, posting some numbers and jumping to conclusions does not help at all. > Monitoring, graphing, etc does help. > ARC is a complex dynamic system. > VM (pagedaemon, UMA caches) is a complex dynamic system. > They interact in a complex dynamic ways. > Sometimes a change in ARC is incorrect and requires a fix. > Sometimes a change in VM is incorrect and requires a fix. > Sometimes a change in VM requires a change in ARC. > These three kinds of problems can all appear as a "problem with ARC". > > For instance, when vm.lowmem_period was introduced you wouldn't find any mention > of ZFS/ARC. But it does affect period between arc_lowmem() calls. > > Also, pin-pointing a specific commit requires proper bisecting and proper > testing to correctly attribute systemic behavior changes to code changes. I just upgraded my main package build box (12.0-CURRENT, 8 cores, 32 GB RAM) from r327616 to r331716. I was seeing higher swap usage and larger ARC sizes before the upgrade than I remember from the distant past, but ARC was still at least somewhat responsive to memory pressure and I didn't notice any performance issues. After the upgrade, ARC size seems to be pretty unresponsive to memory demand. Currently the machine is near the end of a poudriere run to build my usual set of ~1800 ports. The only currently running build is chromium and the machine is paging heavily. Settings of interest are: USE_TMPFS="wrkdir data localbase" ALLOW_MAKE_JOBS=yes last pid: 96239; load averages: 1.86, 1.76, 1.83 up 3+14:47:00 12:38:11 108 processes: 3 running, 105 sleeping CPU: 18.6% user, 0.0% nice, 2.4% system, 0.0% interrupt, 79.0% idle Mem: 129M Active, 865M Inact, 61M Laundry, 29G Wired, 1553K Buf, 888M Free ARC: 23G Total, 8466M MFU, 10G MRU, 5728K Anon, 611M Header, 3886M Other 17G Compressed, 32G Uncompressed, 1.88:1 Ratio Swap: 40G Total, 17G Used, 23G Free, 42% Inuse, 4756K In PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN 96239 nobody 1 76 0 140M 93636K CPU5 5 0:01 82.90% clang- 96238 nobody 1 75 0 140M 92608K CPU7 7 0:01 80.81% clang- 5148 nobody 1 20 0 590M 113M swread 0 0:31 0.29% clang- 57290 root 1 20 0 12128K 2608K zio->i 7 8:11 0.28% find 78958 nobody 1 20 0 838M 299M swread 0 0:23 0.19% clang- 97840 nobody 1 20 0 698M 140M swread 4 0:27 0.13% clang- 96066 nobody 1 20 0 463M 104M swread 1 0:32 0.12% clang- 11050 nobody 1 20 0 892M 154M swread 2 0:39 0.12% clang- Pre-upgrade I was running r327616, which is newer than either of the commits that Jeff mentioned above. It seems like there has been a regression since then. I also don't recall seeing this problem on my Ryzen box, though it has 2x the core count and 2x the RAM. The last testing that I did on it was with r329844.Received on Sun Apr 01 2018 - 17:53:48 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC