Re: [PATCH] Convert the VFS cache lock to an rmlock

From: Ryan Stone <rysto32_at_gmail.com> Date: Fri, 13 Mar 2015 11:23:06 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:56 UTC

On Thu, Mar 12, 2015 at 1:36 PM, Mateusz Guzik <mjguzik_at_gmail.com> wrote:

> Workloads like buildworld and the like (i.e. a lot of forks + execs) run
> into very severe contention in vm, which is orders of magnitude bigger
> than anything else.
>
> As such your result seems quite suspicious.
>

You're right, I did mess up the testing somewhere (I have no idea how).  As
you suggested, I switched to using a separate partition for the objdir, and
ran each build with a freshly newfsed filesystem.  I scripted it to be sure
that I was following the same procedure with each run:

# Build known-working commit from head
git checkout 09be0092bd3285dd33e99bcab593981060e99058 || exit 1
for i in `jot 5`
do
    # Create a fresh fs for objdir
    sudo umount -f /usr/obj 2> /dev/null
    sudo newfs -U -j -L OBJ $objdev || exit 1
    sudo mount $objdev /usr/obj || exit 1
    sudo chmod a+rwx /usr/obj || exit 1

    # Ensure disk cache contains all source files
    git status > /dev/null

    /usr/bin/time -a -o $logfile make -s -j$(sysctl -n hw.ncpu) buildworld
buildkernel
done

I tested on the original 12-core machine, as well as a 2 package x 8 core x
2 HTT (32 logical cores) machine that a co-worker was able to lend me.
Unfortunately, the results show a performance decrease now.  It's almost 5%
on the 32 core machine:

$ ministat -w 74 -C 1 12core/*
x 12core/orig.log
+ 12core/rmlock.log
+--------------------------------------------------------------------------+
|x             xx    x        x     +               +          +    +     +|
|     |_________A__________|               |_______________A___M__________||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       2478.81       2487.74       2483.45      2483.652     3.2495646
+   5       2489.64       2501.67       2498.26      2496.832     4.7394694
Difference at 95.0% confidence
        13.18 +/- 5.92622
        0.53067% +/- 0.238609%
        (Student's t, pooled s = 4.06339)

$ ministat -w 74 -C 1 32core/*
x 32core/orig.log
+ 32core/rmlock.log
+--------------------------------------------------------------------------+
|x   x                                                            +        |
|x   x x                                             +           ++       +|
||__AM|                                                  |_______AM_____|  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       1067.97       1072.86       1071.29      1070.314     2.2238997
+   5       1111.22       1129.05        1122.3      1121.324     6.4046569
Difference at 95.0% confidence
        51.01 +/- 6.99181
        4.76589% +/- 0.653249%
        (Student's t, pooled s = 4.79403)

The difference is due to a significant increase in system time.  Write
locks on an rmlock are extremely expensive (they involve an
smp_rendezvous), and the cost likely scales with the number of cores:

x 32core/orig.log
+ 32core/rmlock.log
+--------------------------------------------------------------------------+
|xxx   x                                                 +   +++          +|
||_MA__|                                                 |____MA______|    |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5       5616.63        5715.7        5641.5       5661.72     48.511545
+   5       6502.51       6781.84        6596.5       6612.39     103.06568
Difference at 95.0% confidence
        950.67 +/- 117.474
        16.7912% +/- 2.07489%
        (Student's t, pooled s = 80.5478)

At this point I'm pretty much at an impasse.  The real-time behaviour is
critical to me, but a 5% performance degradation isn't likely to be
acceptable to many people.  I'll see what I can do with this.