On Mon, 28 Mar 2016 14:52:09 -0700 (PDT) Don Lewis <truckman_at_FreeBSD.org> wrote: > On 28 Mar, O. Hartmann wrote: > > Am Sat, 26 Mar 2016 14:26:45 -0700 (PDT) > > Don Lewis <truckman_at_FreeBSD.org> schrieb: > > > >> On 26 Mar, Michael Butler wrote: > >> > -current is not great for interactive use at all. The strategy of > >> > pre-emptively dropping idle processes to swap is hurting .. big time. > >> > > >> > Compare inactive memory to swap in this example .. > >> > > >> > 110 processes: 1 running, 108 sleeping, 1 zombie > >> > CPU: 1.2% user, 0.0% nice, 4.3% system, 0.0% interrupt, 94.5% idle > >> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free > >> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse > >> > > >> > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > >> > COMMAND > >> > 1819 imb 1 28 0 213M 11284K select 1 147:44 5.97% > >> > gkrellm > >> > 59238 imb 43 20 0 980M 424M select 0 10:07 1.92% > >> > firefox > >> > > >> > .. it shouldn't start randomly swapping out processes because they're > >> > used infrequently when there's more than enough RAM to spare .. > >> > >> I don't know what changed, and probably something can use some tweaking, > >> but paging out idle processes isn't always the wrong thing to do. For > >> instance if I'm using poudriere to build a bunch of packages and its > >> heavy use of tmpfs is pushing the machine into many GB of swap usage, I > >> don't want interactive use like: > >> vi foo.c > >> cc foo.c > >> vi foo.c > >> to suffer because vi and cc have to be read in from a busy hard drive > >> each time while unused console getty and idle sshd processes in a bunch > >> of jails are still hanging on to memory even though they haven't > >> executed any instructions since shortly after the machine was booted > >> weeks ago. > >> > >> > It also shows up when trying to reboot .. on all of my gear, 90 seconds > >> > of "fail-safe" time-out is no longer enough when a good proportion of > >> > daemons have been dropped onto swap and must be brought back in to flush > >> > their data segments :-( > >> > >> That's a different and known problem. See: > >> <https://svnweb.freebsd.org/base/releng/10.3/bin/csh/config_p.h?revision=297204&view=markup> > > > > CURRENT has rendered unusable and faulty. Updating ports for poudriere ends > > up in this error/broken pipe from remote console: > > > > [~] poudriere ports -u -p head > > [00:00:00] ====>> Updating portstree "head" > > [00:00:00] ====>> Updating the ports tree... done > > root_at_gate [~] Fssh_packet_write_wait: Connection to 192.168.250.111 port > > 22: Broken pipe > > > > > > Although not under load, several processes over time gets idled/paged out - > > and they never recover, the connection is then sabott, the whole thing > > unusable :-( > > I'm definitely not seeing that here. This is getting close to the end > of a big poudriere run: > > last pid: 82549; load averages: 20.05, 20.72, 23.51 up 5+12:34:14 > 12:51:55 144 processes: 20 running, 109 sleeping, 15 stopped > CPU: 85.3% user, 0.0% nice, 14.7% system, 0.0% interrupt, 0.0% idle > Mem: 1082M Active, 19G Inact, 9718M Wired, 249M Buf, 1095M Free > ARC: 3841M Total, 2039M MFU, 642M MRU, 3395K Anon, 111M Header, 1044M Other > Swap: 40G Total, 9691M Used, 31G Free, 23% Inuse, 196K In > > At the moment, openoffice-4, openoffice-devel, libreoffice, and chromium > are all being built and are using tmpfs for "wrkdir data localbase", so > there are many GB of data in tmpfs, which is the reason for the high > inact and swap usage. I just hit the return key in an idle (for a > couple of hours) terminal window containing an ssh login session to the > same machine. I got a fresh command prompt essentially instantaneously. > It couldn't have taken more than a couple hundred milliseconds to wake > up and page in the idle sshd and shell processes on the build server. > > [a couple hours later, after poudriere is done and all tmpfs is gone] > > last pid: 66089; load averages: 0.13, 1.59, 4.61 up 5+14:14:33 > 14:32:14 71 processes: 1 running, 55 sleeping, 15 stopped > CPU: 3.1% user, 0.0% nice, 0.0% system, 0.0% interrupt, 96.9% idle > Mem: 58M Active, 85M Inact, 12G Wired, 249M Buf, 19G Free > ARC: 6249M Total, 2792M MFU, 2246M MRU, 16K Anon, 133M Header, 1078M Other > Swap: 40G Total, 81M Used, 40G Free > > [after tracking down and exiting all of those stopped processes] > > last pid: 66103; load averages: 0.20, 0.99, 3.80 up 5+14:17:18 > 14:34:59 56 processes: 1 running, 55 sleeping > CPU: 0.0% user, 0.0% nice, 0.1% system, 0.1% interrupt, 99.9% idle > Mem: 57M Active, 88M Inact, 12G Wired, 249M Buf, 19G Free > ARC: 6251M Total, 2793M MFU, 2247M MRU, 16K Anon, 133M Header, 1078M Other > Swap: 40G Total, 63M Used, 40G Free > > The biggest chunk of the 63 MB of swap appears to be nginx. It's > process size is 29 MB, but it has zero resident. It hasn't executed any > code since it was first started when I booted the system several days > ago. Other consumers appear to be getty and sshd and syslogd in various > untouched jails. > > > I've seen reports that r296137 and r297267 show the ssh problem, but > this machine is in the middle with r297204 and I don't see it. > > As mentioned previously, I'm not running Xorg and a bunch of bloated > X11 clients on this machine. Those make fat targets for having RAM > taken from them, which would probably make my interactive experience > less pleasant, but that should still not affect ssh. > > On my FreeBSD 10 machine, which has only 8 GB of RAM, my experience is > that firefox gets pretty bloated after a while. It's currently at 2.6 > GB (with 2.8 GB of swap currently in use - I've got some other RAM hogs > running as well) and I'm not seeing any problems, but when it gets up in > the 4-5 GB range, things can start to get pretty laggy, but I don't see > problems with ssh. The biggest problem with firefox seems to be > javascript, which seems to leak memory like a sieve. Making heavy use > of the noscript plugin is the only way to keep Firefox usable. > > The only thing I can think of is that this is triggered by something in > the machine configuration or the specific hardware. I'm running a > GENERIC kernel and the only non-standard modification to /usr/src is the > dummynet AQM patchset. The latter should have no effect since I"m not > using ipfw on this machine. > > If I get a chance, I try booting my FreeBSD 11 machine with less RAM to > see if that is a trigger. Several of my boxes do not run X11 or "... a bunch of bloated X11 clients" and they run with 8 GB, 16 GB or 32 GB of RAM (the latter one does have X11). On all remote systems with most recent CURRENT (we are talking about r297237 - 297369 tight now) I definitely do not get "immediately" a fresh prompt. it takes up to 60 seconds (and more) to recover, even if the box is in a state of unemployment (idle!). In a seriously rising bunch of cases I get now broken pipes. This also happens with sessions, when performing "poudriere options" on larger installations and this is completely unacceptable.Received on Tue Mar 29 2016 - 04:08:49 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:03 UTC