Re: Strange behavior after running under high load

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Sun, 28 Mar 2021 18:44:31 +0300
On 28/03/2021 17:39, Stefan Esser wrote:
> After a period of high load, my now idle system needs 4 to 10 seconds to
> run any trivial command - even after 20 minutes of no load ...
> 
> 
> I have run some Monte-Carlo simulations for a few hours, with initially 35
> processes running in parallel for some 10 seconds each.

I saw somewhat similar symptoms with 13-CURRENT some time ago.
To me it looked like even small kernel memory allocations took a very long time.
But it was hard to properly diagnose that as my favorite tool, dtrace, was also
affected by the same problem.

> The load decreased over time since some parameter sets were faster to process.
> All in all 63000 processes ran within some 3 hours.
> 
> When the system became idle, interactive performance was very bad. Running
> any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have
> to have this system working, I plan to reboot it later today, but will keep
> it in this state for some more time to see whether this state persists or
> whether the system recovers from it.
> 
> Any ideas what might cause such a system state???
> 
> 
> The system has a Ryzen 5 3600 CPU (6 core/12 threads) and 32 GB or RAM.
> 
> The following are a few commands that I have tried on this now practically
> idle system:
> 
> $ time vmstat -n 1
>   procs    memory    page                      disks faults       cpu
>   r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
>   2  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  482 7.2K  934 11  1 88
> 
> real    0m9,357s
> user    0m0,001s
> sys    0m0,018
> 
> ---- wait 1 minute ----
> 
> $ time vmstat -n 1
>   procs    memory    page                      disks faults       cpu
>   r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
>   1  0  0  26G 925M 1.2K   1   4   0 1.4K  239   0  482 7.2K  933 11  1 88
> 
> real    0m9,821s
> user    0m0,003s
> sys    0m0,389s
> 
> $ systat -vm
> 
>      4 users    Load  0.10  0.72  3.57                  Mar 28 16:15
>     Mem usage:  97%Phy 55%Kmem                           VN PAGER   SWAP PAGER
> Mem:      REAL           VIRTUAL                         in   out     in  out
>         Tot   Share     Tot    Share     Free   count
> Act  2387M    460K  26481M     460K     923M   pages
> All  2605M    218M  27105M     572M                        ioflt  Interrupts
> Proc:                                                      cow     132 total
>    r   p   d    s   w   Csw  Trp  Sys  Int  Sof  Flt    52 zfod     96 hpet0:t0
>               316       356   39  225  132   21   53       ozfod nvme0:admi
>                                                           %ozfod nvme0:io0
>   0.1%Sys   0.0%Intr  0.0%User  0.0%Nice 99.9%Idle         daefr nvme0:io1
> |    |    |    |    |    |    |    |    |    |    |        prcfr nvme0:io2
>                                                            totfr nvme0:io3
>                                             dtbuf          react nvme0:io4
> Namei      Name-cache   Dir-cache    620370 maxvn          pdwak nvme0:io5
>     Calls    hits   %    hits   %    627486 numvn      168 pdpgs    27 xhci0 66
>        18      14  78                    65 frevn          intrn ahci0 67
>                                                     17539M wire xhci1 68
> Disks  nvd0  ada0  ada1  ada2  ada3  ada4   cd0       430M act       9 re0 69
> KB/t   0.00  0.00  0.00  0.00  0.00  0.00  0.00     12696M inact hdac0 76
> tps       0     0     0     0     0     0     0     54276K laund vgapci0 78
> MB/s   0.00  0.00  0.00  0.00  0.00  0.00  0.00       923M free
> %busy     0     0     0     0     0     0     0          0 buf
> 
> ---- 5 minutes later ----
> 
> $ time vmstat -n 1
>  procs    memory    page                      disks faults       cpu
>  r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
>  1  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  481 7.2K  931 11  1 88
> 
> real    0m4,270s
> user    0m0,000s
> sys    0m0,019s
> 
> $ time uptime
> 16:20  up 23:23, 4 users, load averages: 0,17 0,39 2,68
> 
> real    0m10,840s
> user    0m0,001s
> sys    0m0,374s
> 
> $ time uptime
> 16:37  up 23:40, 4 users, load averages: 0,29 0,27 0,96
> 
> real    0m9,273s
> user    0m0,000s
> sys    0m0,020s
> 


-- 
Andriy Gapon
Received on Sun Mar 28 2021 - 13:44:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:27 UTC