Re: [SOLVED] Re: Strange behavior after running under high load

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Fri, 2 Apr 2021 23:47:33 +0200
On 4/2/21, Stefan Esser <se_at_freebsd.org> wrote:
> Am 28.03.21 um 16:39 schrieb Stefan Esser:
>> After a period of high load, my now idle system needs 4 to 10 seconds to
>> run any trivial command - even after 20 minutes of no load ...
>>
>>
>> I have run some Monte-Carlo simulations for a few hours, with initially
> 35
>> processes running in parallel for some 10 seconds each.
>>
>> The load decreased over time since some parameter sets were faster to
>> process.
>> All in all 63000 processes ran within some 3 hours.
>>
>> When the system became idle, interactive performance was very bad.
>> Running
>> any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I
>> have
>> to have this system working, I plan to reboot it later today, but will
>> keep
>> it in this state for some more time to see whether this state persists or
>> whether the system recovers from it.
>>
>> Any ideas what might cause such a system state???
>
> Seems that Mateusz Guzik was right to mention performance issues when
> the system is very low on vnodes. (Thanks!)
>
> I have been able to reproduce the issue and have checked vnode stats:
>
> kern.maxvnodes: 620370
> kern.minvnodes: 155092
> vm.stats.vm.v_vnodepgsout: 6890171
> vm.stats.vm.v_vnodepgsin: 18475530
> vm.stats.vm.v_vnodeout: 228516
> vm.stats.vm.v_vnodein: 1592444
> vfs.wantfreevnodes: 155092
> vfs.freevnodes: 47	<----- obviously too low ...
> vfs.vnodes_created: 19554702
> vfs.numvnodes: 621284
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 6412
>
> The freevnodes value stayed in this region over several minutes, with
> typical program start times (e.g. for "uptime") in the region of 10 to
> 15 seconds.
>
> After rising maxvnodes to 2,000,000 form 600,000 the system performance
> is restored and I get:
>
> kern.maxvnodes: 2000000
> kern.minvnodes: 500000
> vm.stats.vm.v_vnodepgsout: 7875198
> vm.stats.vm.v_vnodepgsin: 20788679
> vm.stats.vm.v_vnodeout: 261179
> vm.stats.vm.v_vnodein: 1817599
> vfs.wantfreevnodes: 500000
> vfs.freevnodes: 205988	<----- still a lot higher than wantfreevnodes
> vfs.vnodes_created: 19956502
> vfs.numvnodes: 912880
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 20702
>
> I do not know why the performance impact is so high - there are a few
> free vnodes (more than required for the shared libraries to start e.g.
> the uptime program). Most probably each attempt to get a vnode triggers
> a clean-up attempt that runs for a significant time, but has no chance
> to actually reach near the goal of 155k or 500k free vnodes.
>

It is high because of this:
                msleep(&vnlruproc_sig, &vnode_list_mtx, PVFS, "vlruwk", hz);

i.e. it literally sleeps for 1 second.

The vnode limit is probably too conservative and behavior when limit
is reached is rather broken. Probably the thing to do is to let
allocations go through while kicking vnlru to free some stuff up. I'll
have to sleep on it.


> Anyway, kern.maxvnodes can be changed at run-time and it is thus easy
> to fix. It seems that no message is logged to report this situation.
> A rate limited hint to rise the limit should help other affected users.
>
> Regards, STefan
>
>


-- 
Mateusz Guzik <mjguzik gmail.com>
Received on Fri Apr 02 2021 - 19:47:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:27 UTC