Re: CURRENT slow and shaky network stability

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Sat, 2 Apr 2016 11:39:10 +0200
Am Sat, 2 Apr 2016 10:55:03 +0200
"O. Hartmann" <ohartman_at_zedat.fu-berlin.de> schrieb:

> Am Sat, 02 Apr 2016 01:07:55 -0700
> Cy Schubert <Cy.Schubert_at_komquats.com> schrieb:
> 
> > In message <56F6C6B0.6010103_at_protected-networks.net>, Michael Butler writes:  
> > > -current is not great for interactive use at all. The strategy of
> > > pre-emptively dropping idle processes to swap is hurting .. big time.    
> > 
> > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU 
> > doesn't do this.
> >   
> > > 
> > > Compare inactive memory to swap in this example ..
> > > 
> > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse    
> > 
> > To analyze this you need to capture vmstat output. You'll see the free pool 
> > dip below a threshold and pages go out to disk in response. If you have 
> > daemons with small working sets, pages that are not part of the working 
> > sets for daemons or applications will eventually be paged out. This is not 
> > a bad thing. In your example above, the 281 MB of UFS buffers are more 
> > active than the 917 MB paged out. If it's paged out and never used again, 
> > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The 
> > inactive pages are part of your free pool that were active at one time but 
> > now are not. They may be reclaimed and if they are, you've just saved more 
> > I/O.
> > 
> > Top is a poor tool to analyze memory use. Vmstat is the better tool to help 
> > understand memory use. Inactive memory isn't a bad thing per se. Monitor 
> > page outs, scan rate and page reclaims.
> > 
> >   
> 
> I give up! Tried to check via ssh/vmstat what is going on. Last lines before broken
> pipe:
> 
> [...]
> procs  memory       page                    disks     faults         cpu
> r b w  avm   fre   flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs us sy id
> 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 95  5  0
> 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 93  7  0
> 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 91  9  0
> 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 88 12  0
> 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 704359 87 13  0
> 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 484861 93  7  0
> 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 95  5  0
> 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 89 11  0
> 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999 85 15  0
> 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 95  5  0
> Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> 
> 
> This makes this crap system completely unusable. The server (FreeBSD 11.0-CURRENT #20
> r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere bulk job. I can
> not even determine what terminal goes down first - another one, much more time idle than
> the one shwoing the "vmstat 5" output, is still alive! 
> 
> i consider this a serious bug and it is no benefit what happened since this "fancy"
> update. :-(

By the way - it might be of interest and some hint.

One of my boxes is acting as server and gateway. It utilises NAT, IPFW, when it is under
high load, as it was today, sometimes passing the network flow from ISP into the network
for clients is extremely slow. I do not consider this the reason for collapsing ssh
sessions, since this incident happens also under no-load, but in the overall-view onto
the problem, this could be a hint - I hope. 

Received on Sat Apr 02 2016 - 07:38:47 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:03 UTC