Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Sat, 17 Mar 2018 10:38:48 +0100
Am Sun, 11 Mar 2018 18:00:35 -1000 (HST)
Jeff Roberson <jroberson_at_jroberson.net> schrieb:

> On Sun, 11 Mar 2018, Mark Millard wrote:
> 
> > As I understand, O. Hartmann's report ( ohartmann at walstatt.org ) in:
> >
> > https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068806.html
> >
> > includes a system with a completely non-ZFS context: UFS only. Quoting that part:
> >  
> >> This is from a APU, no ZFS, UFS on a small mSATA device, the APU (PCenigine) works
> >> as a firewall, router, PBX):
> >>
> >> last pid:  9665;  load averages:  0.13,  0.13,  0.11
> >> up 3+06:53:55  00:26:26 19 processes:  1 running, 18 sleeping CPU:  0.3% user,  0.0%
> >> nice,  0.2% system,  0.0% interrupt, 99.5% idle Mem: 27M Active, 6200K Inact, 83M
> >> Laundry, 185M Wired, 128K Buf, 675M Free Swap: 7808M Total, 2856K Used, 7805M Free
> >> [...]
> >>
> >> The APU is running CURRENT ( FreeBSD 12.0-CURRENT #42 r330608: Wed Mar  7 16:55:59
> >> CET 2018 amd64). Usually, the APU never(!) uses swap, now it is starting to swap
> >> like hell for a couple of days and I have to reboot it failty often. 


Yes, that is correct.

The system in question (the PCengine APU4C, articially limited to 1 GB RAM via boot
loader option) does run an asterisk PBX and is our Firewall/Router appliance. The
kernel/world is highly customized and "reduced" (via NanoBSD WITHOUT_ build options).

Since its existence mid of last year always running CURRENT since then, I realized only
that Asterisk might have a memory leak, but as some commits in the past suggests, it
also could be triggered by a bug in syslog. That's the background for having only 1 GB
out of 4GB configured.

Since a couple of weeks now, this APU starts swapping and keeping ~ 4GB of allocated swap,
right now, running CUURENT FreeBSD 12.0-CURRENT #50 r330750: Sun Mar 11 01:14:34 CET 2018
amd64:

 last pid: 16958;  load averages:  0.10,  0.21,
0.16
up 6+08:57:07  10:34:01 19 processes:  1 running, 18 sleeping CPU:  0.3% user,  0.0%
nice,  0.5% system,  0.0% interrupt, 99.2% idle Mem: 27M Active, 1504K Inact, 96M
Laundry, 188M Wired, 1900K Buf, 664M Free Swap: 7808M Total, 4204K Used, 7804M Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
  997 asterisk        59  52    0   133M 61220K select  0 278:59   2.69% asterisk
16958 root             1  20    0 13308K  3448K CPU2    2   0:00   0.14% top
  579 root             1  20    0 15252K  3116K select  1  35:01   0.07% ppp
  933 root             1  20    0 10892K  1688K select  0   1:38   0.02% powerd
 1038 root             1  20    0 11400K   764K nanslp  1   0:03   0.02% cron
  930 root             1  20    0 18200K 18280K select  0   0:47   0.01% ntpd
 1005 root             1  20    0 14772K  4516K bpf     3   0:13   0.00% arpwatch
  834 root             1  20    0 11364K  1992K select  1   4:46   0.00% syslogd
  847 bind             7  52    0 59528K 30600K sigwai  2   1:30   0.00% named
  989 root             1  20    0 32264K  1716K nanslp  2   0:20   0.00% perl
  863 daemon           1  20    0 11388K  1944K select  1   0:01   0.00% rpcbind
  872 root             1  20    0 11096K  1840K autofs  2   0:01   0.00% automountd
  975 root             1  20    0 14548K     0K nanslp  3   0:00   0.00% <smartd>
  968 dhcpd            1  20    0 22972K  7648K select  0   0:00   0.00% dhcpd
  878 root             1  20    0 10988K     0K kqread  1   0:00   0.00% <autounmountd>
13271 root             1  20    0 12104K  3128K wait    1   0:00   0.00% login
16955 root             1  26    0 13204K  4080K pause   3   0:00   0.00% csh
 1034 root             1  20    0 18340K  3976K select  0   0:00   0.00% sshd
 1084 root             1  52    0 10928K  1724K ttyin   2   0:00   0.00% getty

This box doesn't have ZFS! There is a small mSATA device UFS2 formatted for loggin and
automountfs usage.

I know this "repeated" report could annoy, but maybe it gives some more
informations/confirmations according to have a time series of incidents.

> >
> > Unless this is unrelated, it would suggest that ZFS and its ARC need not
> > be involved.
> >
> > Would what you are investigating relative to your "NUMA and concurrency
> > related work" fit with such a non-ZFS (no-ARC) context?  
> 
> I think there are probably two different bugs.  I believe the pid 
> controller has caused the laundry thread to start being more aggressive 
> causing more pageouts which would cause increased swap consumption.
> 
> The back-pressure mechanisms in arch should've resolved the other reports. 
> It's possible that I broke those.  Although if the reports from 11.x are 
> to be believed I don't know that it was me.  It is possible they have been 
> broken at different times for different reasons.  So I will continue to 
> look.
> 
> Thanks,
> Jeff
> 
> >
> > ===
> > Mark Millard
> > marklmi at yahoo.com
> > ( dsl-only.net went
> > away in early 2018-Mar)
> >  
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"



-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).

Received on Sat Mar 17 2018 - 08:39:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC