Re: CURRENT slow and shaky network stability

From: Cy Schubert <Cy.Schubert_at_komquats.com> Date: Sat, 02 Apr 2016 15:57:52 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:03 UTC

In message <CAN6yY1tioVfhnEu0eEyE4wROJ5bJ3an2rOYM95cj1d44CTiJew_at_mail.gmail.c
om>
, Kevin Oberman writes:
> --089e01176a5d71db0d052f8803c7
> Content-Type: text/plain; charset=UTF-8
> 
> On Sat, Apr 2, 2016 at 2:19 PM, O. Hartmann <ohartman_at_zedat.fu-berlin.de>
> wrote:
> 
> > Am Sat, 2 Apr 2016 11:39:10 +0200
> > "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> schrieb:
> >
> > > Am Sat, 2 Apr 2016 10:55:03 +0200
> > > "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> schrieb:
> > >
> > > > Am Sat, 02 Apr 2016 01:07:55 -0700
> > > > Cy Schubert <Cy.Schubert_at_komquats.com> schrieb:
> > > >
> > > > > In message <56F6C6B0.6010103_at_protected-networks.net>, Michael
> > Butler writes:
> > > > > > -current is not great for interactive use at all. The strategy of
> > > > > > pre-emptively dropping idle processes to swap is hurting .. big
> > time.
> > > > >
> > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to
> > disk. LRU
> > > > > doesn't do this.
> > > > >
> > > > > >
> > > > > > Compare inactive memory to swap in this example ..
> > > > > >
> > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5%
> > idle
> > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > > > >
> > > > > To analyze this you need to capture vmstat output. You'll see the
> > free pool
> > > > > dip below a threshold and pages go out to disk in response. If you
> > have
> > > > > daemons with small working sets, pages that are not part of the
> > working
> > > > > sets for daemons or applications will eventually be paged out. This
> > is not
> > > > > a bad thing. In your example above, the 281 MB of UFS buffers are
> > more
> > > > > active than the 917 MB paged out. If it's paged out and never used
> > again,
> > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O.
> > The
> > > > > inactive pages are part of your free pool that were active at one
> > time but
> > > > > now are not. They may be reclaimed and if they are, you've just
> > saved more
> > > > > I/O.
> > > > >
> > > > > Top is a poor tool to analyze memory use. Vmstat is the better tool
> > to help
> > > > > understand memory use. Inactive memory isn't a bad thing per se.
> > Monitor
> > > > > page outs, scan rate and page reclaims.
> > > > >
> > > > >
> > > >
> > > > I give up! Tried to check via ssh/vmstat what is going on. Last lines
> > before broken
> > > > pipe:
> > > >
> > > > [...]
> > > > procs  memory       page                    disks     faults
> >  cpu
> > > > r b w  avm   fre   flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs
> > us sy id
> > > > 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907
> > 5400 95  5  0
> > > > 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869
> > 3459 93  7  0
> > > > 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192
> > 4366 91  9  0
> > > > 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209
> > 4368 88 12  0
> > > > 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569
> > 704359 87 13  0
> > > > 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337
> > 484861 93  7  0
> > > > 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131
> > 44407 95  5  0
> > > > 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366
> > 38060 89 11  0
> > > > 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371
> > 4999 85 15  0
> > > > 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142
> > 4431 95  5  0
> > > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> > > >
> > > >
> > > > This makes this crap system completely unusable. The server (FreeBSD
> > 11.0-CURRENT #20
> > > > r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did
> > poudriere bulk job. I
> > > > can not even determine what terminal goes down first - another one,
> > much more time
> > > > idle than the one shwoing the "vmstat 5" output, is still alive!
> > > >
> > > > i consider this a serious bug and it is no benefit what happened since
> > this "fancy"
> > > > update. :-(
> > >
> > > By the way - it might be of interest and some hint.
> > >
> > > One of my boxes is acting as server and gateway. It utilises NAT, IPFW,
> > when it is under
> > > high load, as it was today, sometimes passing the network flow from ISP
> > into the network
> > > for clients is extremely slow. I do not consider this the reason for
> > collapsing ssh
> > > sessions, since this incident happens also under no-load, but in the
> > overall-view onto
> > > the problem, this could be a hint - I hope.
> >
> > I just checked on one box, that "broke pipe" very quickly after I started
> > poudriere,
> > while it did well a couple of hours before until the pipe broke. It seems
> > it's load
> > dependend when the ssh session gets wrecked, but more important, after the
> > long-haul
> > poudriere run, I rebooted the box and tried again with the mentioned
> > broken pipe after a
> > couple of minutes after poudriere ran. Then I left the box for several
> > hours and logged
> > in again and checked the swap. Although there was for hours no load or
> > other pressure,
> > there were 31% of of swap used - still (box has 16 GB of RAM and is
> > propelled by a XEON
> > E3-1245 V2).
> >
> 
> Unless something has changed, just as things are not preemptively swapped
> out, they are also not preemptively swapped back in. AFAIK, once a process
> is swapped out, it will remain swapped out until/unless it becomes active.
> At that time it is swapped in and this can entail a significant delay. If
> my laptop is locked and something (usually Chromium) starts eating all of
> the memory and processes start swapping out, it can take >5 seconds to get
> the unlock window to display.

Yes!

-- 
Cheers,
Cy Schubert <Cy.Schubert_at_komquats.com> or <Cy.Schubert_at_cschubert.com>
FreeBSD UNIX:  <cy_at_FreeBSD.org>   Web:  http://www.FreeBSD.org

	The need of the many outweighs the greed of the few.