Re: odd -CURRENT performance issue

From: Don Lewis <truckman_at_FreeBSD.org> Date: Tue, 8 Feb 2005 15:29:09 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:27 UTC

On  8 Feb, John Baldwin wrote:
> On Monday 07 February 2005 06:49 pm, Don Lewis wrote:
>> 	time make all-depends-list
>> ten times in a row in /usr/ports/x11/gnome2.  The system was freshly
>> booted, and other than a niced setiathome, the system was idle.  I got
>> the following results:
>>
>>       122.63 real        29.83 user        66.29 sys
>>       117.86 real        29.58 user        66.55 sys
>>       119.89 real        29.80 user        67.27 sys
>>       121.68 real        30.31 user        67.55 sys
>>       122.10 real        30.41 user        67.66 sys
>>       120.90 real        29.67 user        68.24 sys
>>       121.21 real        30.27 user        67.69 sys
>>       219.85 real        30.35 user       148.12 sys
>>       312.54 real        30.49 user       224.79 sys
>>       312.83 real        30.23 user       225.83 sys
>>
>> The last three lines are rather strange ...
>>
>> Kernel and world are 6.0-CURRENT, built from February 3rd sources.
>> WITNESS, INVARIANTS, DEBUG_VFS_LOCKS, and SMP enabled, and it is fairly
>> close to GENERIC.
> 
> WITNESS has especially poor scalability issues and can get much worse as more 
> lock classes and relationships are added to its tree due to how it tries to 
> rebalance the tree every time it adds a new relationship (and each rebalance 
> is O(n) I think).  If you turn off witness via sysctl debug.witness.watch=0,. 
> does the same weirdness persist?

I'm in the middle of a portupgrade run to catch up with the latest perl
upgrade, so it'll be a while before I can try that experiment.  Earlier
today I broke into DDB a bunch of times to get some stack traces to try
to figure out where the kernel was spending most of its time.  Most of
the stack traces had this in common:

critical_exit(c094d12c,eb27bbac,c0643c27,c0906a20,0) at critical_exit+0xb2
_mtx_unlock_spin_flags(c0906a20,0,c083c0bd,325,c28515c0) at _mtx_unlock_spin_flags+0x8d
witness_checkorder(c2dad7bc,9,c0852ab7,9f) at witness_checkorder+0x29f
_mtx_lock_flags(c2dad7bc,0,c0852ab7,9f,22) at _mtx_lock_flags+0x5b

where witness_checkorder+0x29f is here:

        /*
         * If we know that the the lock we are acquiring comes after
         * the lock we most recently acquired in the lock order tree,
         * then there is no need for any further checks.
         */
        if (isitmydescendant(w1, w)) {
                mtx_unlock_spin(&w_mtx);

and by far the majority of the callers were calling VM_OBJECT_LOCK().

I suspect that the tree has gotten rebalanced so that the "vm object"
lock suddenly takes a much longer time to find.  I'll try to get
snapshots of the tree before and after the problem occurs.