Re: FreeBSD 11.x grinds to a halt after about 48h of uptime

From: Kevin Oberman <rkoberman_at_gmail.com>
Date: Sat, 15 Oct 2016 09:36:27 -0700
On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky <hps_at_selasky.org>
wrote:

> On 10/15/16 18:18, Ulrich Spörlein wrote:
>
>> Hey all, while 11.x is -STABLE now, this happens to my machine ever
>> since I upgraded it to 11-CURRENT years ago. I have no idea when this
>> started, actually, but what always happens is this:
>>
>> - System and X11 is up and running, I keep it running over night as I'm
>> too lazy to reboot and restart everthing.
>> - There's a bunch of xterms, Chrome, Clementine-Player and some other
>> programs running
>> - Coming back to the machine the next day (or the day after) it will
>> exit the screensaver just fine and then either I can use it for a couple
>> of seconds before it freezes, or it's pretty much dead already. The
>> mouse cursor still moves for a bit, but the also freezes (so it this a
>> GPU problem??)
>>
>> Now what I currently see on the screen is a clock widget stuck at 18:04
>> but conky itself has last updated at 18:00:18 ...
>>
>> This time I had some SSH sessions from another machine to see some more
>> useful things. There was nothing in various logs under /var/log (I also
>> can't run dmesg anymore ...)
>> I had top(1) running in a loop, this is the last output:
>>
>> last pid: 25633;  load averages:  0.27,  0.39,  0.36  up 1+23:03:28
>> 18:00:12
>> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
>>
>> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
>> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
>> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
>>
>>
>>   PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
>> COMMAND
>>    11 root            8 155 ki31     0K   128K CPU0    0 364.6H 772.95%
>> idle
>>              3122 uqs            15  28    0  7113M  5861M uwait   0
>> 94:44  13.96% chrome
>>                            2887 uqs            28  22    0  1394M   237M
>> select  2 172:53   6.98% chrome
>>                                        2890 uqs            11  21    0
>> 1034M   178M select  5 231:21   1.95% chrome
>>                                                    1062 root            9
>> 21    0   440M 47220K select  0  67:09   0.98% Xorg
>>                                                              3002 uqs
>>       15  25    5  1159M   172M uwait   2  19:09   0.00% chrome
>>  3139 uqs            17  25    5  1163M   156M uwait   2  16:15   0.00%
>> chrome
>>  3001 uqs            18  25    5  1639M   575M uwait   0  16:05   0.00%
>> chrome
>>    12 root           24 -64    -     0K   384K WAIT   -1  10:53   0.00%
>> intr
>>  3129 uqs            12  20    0  2820M  1746M uwait   6   8:36   0.00%
>> chrome
>>  2822 uqs             9  20    0   217M 81300K select  0   5:10   0.00%
>> conky
>>  3174 root            1  20    0 21532K  3188K select  0   4:20   0.00%
>> systat
>>  3130 uqs            16  20    0  1058M   131M uwait   4   3:03   0.00%
>> chrome
>>  2998 uqs            16  20    0  1110M   123M uwait   2   2:53   0.00%
>> chrome
>>  3165 uqs            10  20    0  1209M   215M uwait   6   2:52   0.00%
>> chrome
>>  3142 uqs            11  25    5  1344M   195M uwait   2   2:46   0.00%
>> chrome
>>  2876 uqs            19  20    0   580M 37164K select  3   2:42   0.00%
>> clementine-player
>>    20 root            2 -16    -     0K    32K psleep  6   2:25   0.00%
>> pagedaemon
>>
>> I also had systat -vm running and it continued to update its screen ...
>> for a short while, this is the last update before SSH died:
>>
>>
>>    Mem usage:  0k%Phy  5%Kmem
>> Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP
>> PAGER
>>         Tot   Share      Tot    Share    Free           in   out     in
>>  out
>> Act  11051k   67868 71051992   255448   61840  count
>> All  11051k   67924 71058776   262100          pages
>> Proc:
>> Interrupts
>>   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt   224
>> total
>>      25     730  11   724  109  404  101   13             cow       2
>> ehci0 16
>>                                                           zfod      3
>> ehci1 23
>>  0.0%Sys   0.1%Intr  0.0%User  0.0%Nice 99.9%Idle         ozfod    16
>> cpu0:timer
>> |    |    |    |    |    |    |    |    |    |           %ozfod
>>  xhci0 264
>>                                                           daefr     3 em0
>> 265
>>                                         50 dtbuf          prcfr    94
>> hdac1 266
>> Namei     Name-cache   Dir-cache    349167 desvn          totfr
>>  ahci0 270
>>    Calls    hits   %    hits   %    349155 numvn          react     5
>> cpu1:timer
>>      121     121 100                253501 frevn          pdwak     1
>> cpu2:timer
>>                                                           pdpgs    29
>> cpu7:timer
>> Disks   md0  ada0  ada1 pass0 pass1 pass2                 intrn    12
>> cpu3:timer
>> KB/t   0.00  0.00  0.00  0.00  0.00  0.00         5318892 wire     41
>> cpu6:timer
>> tps       0     0     0     0     0     0         9261404 act      12
>> cpu5:timer
>> MB/s   0.00  0.00  0.00  0.00  0.00  0.00         1598184 inact     6
>> cpu4:timer
>> %busy     0     0     0     0     0     0                 cache
>>  vgapci0
>>                                                     61840 free
>>                                                    712304 buf
>>
>>
>> Why do I have a Chrome tab using about 6G? What other sort of debugging
>> output can be helpful to get to the bottom of this? The machine still
>> responds to pings just fine, TCP connections get set up but the SSH
>> handshake never completes.
>>
>> This always happens between 30-50h and is super annoying and has been
>> going on for >1year. Help?
>>
>> Note, I cut the power to the monitor overnight to save electricity, can
>> this mess up something in the Radeon card or X server? What combinations
>> would be most useful to try next?
>>
>>
> Hi,
>
> Sounds like a memory leak. Can you track the memory use over time?
>
> Did you look at the output from:
>
> vmstat -m ?
>
> --HPS


I have noted significant  memory leakage in chromium for some time. If I
leave it running overnight, my system is essentially frozen. If I terminate
the chromium process, it slowly comes back to life. I always keep a gkrellm
session on-screen where the memory and swap utilization is continuously
displayed and that clearly shows resources declining.

Try closing your chromium at night and see if that fixes the problem.

If you have never tried gkrellm (sysutils/gkrellm2), it is a the best
system monitor I have found. though pulls in a lot of dependencies. It also
can run as a server with remote systems displaying the data. Handy to
monitor servers.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkoberman_at_gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
Received on Sat Oct 15 2016 - 14:36:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC