Re: FreeBSD 11.x grinds to a halt after about 48h of uptime

From: Hans Petter Selasky <hps_at_selasky.org>
Date: Sat, 15 Oct 2016 18:26:16 +0200
On 10/15/16 18:18, Ulrich Spörlein wrote:
> Hey all, while 11.x is -STABLE now, this happens to my machine ever
> since I upgraded it to 11-CURRENT years ago. I have no idea when this
> started, actually, but what always happens is this:
>
> - System and X11 is up and running, I keep it running over night as I'm
> too lazy to reboot and restart everthing.
> - There's a bunch of xterms, Chrome, Clementine-Player and some other
> programs running
> - Coming back to the machine the next day (or the day after) it will
> exit the screensaver just fine and then either I can use it for a couple
> of seconds before it freezes, or it's pretty much dead already. The
> mouse cursor still moves for a bit, but the also freezes (so it this a
> GPU problem??)
>
> Now what I currently see on the screen is a clock widget stuck at 18:04
> but conky itself has last updated at 18:00:18 ...
>
> This time I had some SSH sessions from another machine to see some more
> useful things. There was nothing in various logs under /var/log (I also
> can't run dmesg anymore ...)
> I had top(1) running in a loop, this is the last output:
>
> last pid: 25633;  load averages:  0.27,  0.39,  0.36  up 1+23:03:28    18:00:12
> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
>
> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Free
> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 1012M Other
> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
>
>
>   PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
>    11 root            8 155 ki31     0K   128K CPU0    0 364.6H 772.95% idle                                                                                     3122 uqs            15  28    0  7113M  5861M uwait   0  94:44  13.96% chrome                                                                                   2887 uqs            28  22    0  1394M   237M select  2 172:53   6.98% chrome                                                                                   2890 uqs            11  21    0  1034M   178M select  5 231:21   1.95% chrome                                                                                   1062 root            9  21    0   440M 47220K select  0  67:09   0.98% Xorg                                                                                     3002 uqs            15  25    5  1159M   172M uwait   2  19:09   0.00% chrome
>  3139 uqs            17  25    5  1163M   156M uwait   2  16:15   0.00% chrome
>  3001 uqs            18  25    5  1639M   575M uwait   0  16:05   0.00% chrome
>    12 root           24 -64    -     0K   384K WAIT   -1  10:53   0.00% intr
>  3129 uqs            12  20    0  2820M  1746M uwait   6   8:36   0.00% chrome
>  2822 uqs             9  20    0   217M 81300K select  0   5:10   0.00% conky
>  3174 root            1  20    0 21532K  3188K select  0   4:20   0.00% systat
>  3130 uqs            16  20    0  1058M   131M uwait   4   3:03   0.00% chrome
>  2998 uqs            16  20    0  1110M   123M uwait   2   2:53   0.00% chrome
>  3165 uqs            10  20    0  1209M   215M uwait   6   2:52   0.00% chrome
>  3142 uqs            11  25    5  1344M   195M uwait   2   2:46   0.00% chrome
>  2876 uqs            19  20    0   580M 37164K select  3   2:42   0.00% clementine-player
>    20 root            2 -16    -     0K    32K psleep  6   2:25   0.00% pagedaemon
>
> I also had systat -vm running and it continued to update its screen ...
> for a short while, this is the last update before SSH died:
>
>
>    Mem usage:  0k%Phy  5%Kmem
> Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
>         Tot   Share      Tot    Share    Free           in   out     in   out
> Act  11051k   67868 71051992   255448   61840  count
> All  11051k   67924 71058776   262100          pages
> Proc:                                                            Interrupts
>   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt   224 total
>      25     730  11   724  109  404  101   13             cow       2 ehci0 16
>                                                           zfod      3 ehci1 23
>  0.0%Sys   0.1%Intr  0.0%User  0.0%Nice 99.9%Idle         ozfod    16 cpu0:timer
> |    |    |    |    |    |    |    |    |    |           %ozfod       xhci0 264
>                                                           daefr     3 em0 265
>                                         50 dtbuf          prcfr    94 hdac1 266
> Namei     Name-cache   Dir-cache    349167 desvn          totfr       ahci0 270
>    Calls    hits   %    hits   %    349155 numvn          react     5 cpu1:timer
>      121     121 100                253501 frevn          pdwak     1 cpu2:timer
>                                                           pdpgs    29 cpu7:timer
> Disks   md0  ada0  ada1 pass0 pass1 pass2                 intrn    12 cpu3:timer
> KB/t   0.00  0.00  0.00  0.00  0.00  0.00         5318892 wire     41 cpu6:timer
> tps       0     0     0     0     0     0         9261404 act      12 cpu5:timer
> MB/s   0.00  0.00  0.00  0.00  0.00  0.00         1598184 inact     6 cpu4:timer
> %busy     0     0     0     0     0     0                 cache       vgapci0
>                                                     61840 free
>                                                    712304 buf
>
>
> Why do I have a Chrome tab using about 6G? What other sort of debugging
> output can be helpful to get to the bottom of this? The machine still
> responds to pings just fine, TCP connections get set up but the SSH
> handshake never completes.
>
> This always happens between 30-50h and is super annoying and has been
> going on for >1year. Help?
>
> Note, I cut the power to the monitor overnight to save electricity, can
> this mess up something in the Radeon card or X server? What combinations
> would be most useful to try next?
>

Hi,

Sounds like a memory leak. Can you track the memory use over time?

Did you look at the output from:

vmstat -m ?

--HPS
Received on Sat Oct 15 2016 - 14:21:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC