Strange behavior after running under high load

From: Stefan Esser <se_at_freebsd.org>
Date: Sun, 28 Mar 2021 16:39:41 +0200
After a period of high load, my now idle system needs 4 to 10 seconds to
run any trivial command - even after 20 minutes of no load ...


I have run some Monte-Carlo simulations for a few hours, with initially 35 
processes running in parallel for some 10 seconds each.

The load decreased over time since some parameter sets were faster to process.
All in all 63000 processes ran within some 3 hours.

When the system became idle, interactive performance was very bad. Running
any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have
to have this system working, I plan to reboot it later today, but will keep
it in this state for some more time to see whether this state persists or
whether the system recovers from it.

Any ideas what might cause such a system state???


The system has a Ryzen 5 3600 CPU (6 core/12 threads) and 32 GB or RAM.

The following are a few commands that I have tried on this now practically
idle system:

$ time vmstat -n 1
   procs    memory    page                      disks faults       cpu
   r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
   2  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  482 7.2K  934 11  1 88

real	0m9,357s
user	0m0,001s
sys	0m0,018

---- wait 1 minute ----

$ time vmstat -n 1
   procs    memory    page                      disks faults       cpu
   r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
   1  0  0  26G 925M 1.2K   1   4   0 1.4K  239   0  482 7.2K  933 11  1 88

real	0m9,821s
user	0m0,003s
sys	0m0,389s

$ systat -vm

      4 users    Load  0.10  0.72  3.57                  Mar 28 16:15
     Mem usage:  97%Phy 55%Kmem                           VN PAGER   SWAP 
PAGER
Mem:      REAL           VIRTUAL                         in   out     in  
 out
         Tot   Share     Tot    Share     Free   count
Act  2387M    460K  26481M     460K     923M   pages
All  2605M    218M  27105M     572M                        ioflt  Interrupts
Proc:                                                      cow     132 total
    r   p   d    s   w   Csw  Trp  Sys  Int  Sof  Flt    52 zfod     96 hpet0:t0
               316       356   39  225  132   21   53       ozfod nvme0:admi
                                                           %ozfod nvme0:io0
   0.1%Sys   0.0%Intr  0.0%User  0.0%Nice 99.9%Idle         daefr nvme0:io1
|    |    |    |    |    |    |    |    |    |    |        prcfr nvme0:io2
                                                            totfr nvme0:io3
                                             dtbuf          react nvme0:io4
Namei      Name-cache   Dir-cache    620370 maxvn          pdwak nvme0:io5
     Calls    hits   %    hits   %    627486 numvn      168 pdpgs    27 xhci0 66
        18      14  78                    65 frevn          intrn ahci0 67
                                                     17539M wire xhci1 68
Disks  nvd0  ada0  ada1  ada2  ada3  ada4   cd0       430M act       9 re0 69
KB/t   0.00  0.00  0.00  0.00  0.00  0.00  0.00     12696M inact hdac0 76
tps       0     0     0     0     0     0     0     54276K laund vgapci0 78
MB/s   0.00  0.00  0.00  0.00  0.00  0.00  0.00       923M free
%busy     0     0     0     0     0     0     0          0 buf

---- 5 minutes later ----

$ time vmstat -n 1
  procs    memory    page                      disks faults       cpu
  r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
  1  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  481 7.2K  931 11  1 88

real	0m4,270s
user	0m0,000s
sys	0m0,019s

$ time uptime
16:20  up 23:23, 4 users, load averages: 0,17 0,39 2,68

real	0m10,840s
user	0m0,001s
sys	0m0,374s

$ time uptime
16:37  up 23:40, 4 users, load averages: 0,29 0,27 0,96

real	0m9,273s
user	0m0,000s
sys	0m0,020s


Received on Sun Mar 28 2021 - 12:39:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:27 UTC