[PATCH] Convert the VFS cache lock to an rmlock

From: Ryan Stone <rysto32_at_gmail.com>
Date: Thu, 12 Mar 2015 11:14:42 -0400
I've just submitted a patch to Differential[1] for review that converts the
VFS cache to use an rmlock in place of the current rwlock.  My main
motivation for the change is to fix a priority inversion problem that I saw
recently.  A real-time priority thread attempted to acquire a write lock on
the VFS cache lock, but there was already a reader holding it.  The reader
was preempted by a normal priority thread, and my real-time thread was
starved.

[1] https://reviews.freebsd.org/D2051


I was worried about the performance implications of the change, as I wasn't
sure how common write operations on the VFS cache would be.  I did a -j12
buildworld/buildkernel test on a 12-core Haswell Xeon system, as I figured
that would be a reasonable stress test that simultaneously creates lots of
small files and reads a lot of files as well.  This actually wound up being
about a 10% performance *increase* (the units below are seconds of elapsed
time as measured by /usr/bin/time, so smaller is better):

$ ministat -C 1 orig.log rmlock.log
x orig.log
+ rmlock.log
+------------------------------------------------------------------------------+
|  +                                                                     x
    |
|++++                                            x                    x xxx
   |
| |A|
 |_________A___M____||
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   6       2710.31       2821.35       2816.75     2798.0617     43.324817
+   5       2488.25       2500.25       2498.04      2495.756     5.0494782
Difference at 95.0% confidence
        -302.306 +/- 44.4709
        -10.8041% +/- 1.58935%
        (Student's t, pooled s = 32.4674)

The one outlier in the rwlock case does confuse me a bit.  What I did was
booted a freshly-built image with the rmlock lock applied, did a git
checkout of head, and then did 5 builds in a row.  The git checkout should
have had the effect of priming the disk cache with the source files.  Then
I installed the stock head kernel, rebooted, and ran 5 more builds (and
then 1 more when I noticed the outlier).  The fast outlier was the *first*
run, which should have been running with a cold disk cache, so I really
don't know why it would be 90 seconds faster.  I do see that this run also
had about 500-600 fewer seconds spent in system time:

x orig.log
+------------------------------------------------------------------------------+
|
x             |
|x                                                        x   x
xx             |
|
|_________________________A__________M_____________||
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   6       3515.23       4121.84       4105.57       4001.71     239.61362

I'm not sure how much that I care, given that the rmlock is universally
faster (but maybe I should try the "cold boot" case anyway).

If anybody had any comments or further testing that they would like to see,
please let me know.
Received on Thu Mar 12 2015 - 14:14:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:56 UTC