Re: [PATCH] Convert the VFS cache lock to an rmlock

From: Alan Cox <alc_at_rice.edu>
Date: Thu, 19 Mar 2015 13:02:37 -0500
On 03/18/2015 12:58, Mateusz Guzik wrote:
> On Wed, Mar 18, 2015 at 10:17:22AM -0400, John Baldwin wrote:
>> On Friday, March 13, 2015 06:32:03 AM Mateusz Guzik wrote:
>>> On Thu, Mar 12, 2015 at 06:13:00PM -0500, Alan Cox wrote:
>>>> Below is partial results from a profile of a parallel (-j7) "buildworld" on
>>>> a 6-core machine that I did after the introduction of pmap_advise, so this
>>>> is not a new profile.  The results are sorted by total waiting time and
>>>> only the top 20 entries are listed.
>>>>
>>> Well, I ran stuff on lynx2 in the zoo on fresh -head with debugging
>>> disabled (MALLOC_PRODUCTION included) and got quite different results.
>>>
>>> The machine is  Intel(R) Xeon(R) CPU E5-2680 v2 _at_ 2.80GHz
>>>  2 package(s) x 10 core(s) x 2 SMT threads
>>>
>>> 32GB of ram
>>>
>>> Stuff was built in a chroot with world hosted on zfs.
>>>
>>>>      max  wait_max       total  wait_total       count    avg wait_avg
>>>> cnt_hold cnt_lock name
>>>>
>>>>     1027    208500    16292932  1658585700     5297163      3    313  0
>>>> 3313855 kern/vfs_cache.c:629 (rw:Name Cache)
>>>>
>>>>   208564    186514 19080891106  1129189627   355575930     53      3  0
>>>> 1323051 kern/vfs_subr.c:2099 (lockmgr:ufs)
>>>>
>>>>   169241    148057   193721142   419075449    13819553     14     30  0
>>>> 110089 kern/vfs_subr.c:2210 (lockmgr:ufs)
>>>>
>>>>   187092    191775  1923061952   257319238   328416784      5      0  0
>>>> 5106537 kern/vfs_cache.c:488 (rw:Name Cache)
>>>>
>>> make -j 12 buildworld on freshly booted system (i.e. the most namecache insertions):
>>>
>>>       32       292     3042019    33400306     8419725      0      3  0 2578026 kern/sys_pipe.c:1438 (sleep mutex:pipe mutex)
>>>   170608    152572   642385744    27054977   202605015      3      0  0 1306662 kern/vfs_subr.c:2176 (lockmgr:zfs)
>> You are using ZFS, Alan was using UFS.  It would not surprise me that those
>> would perform quite differently, and it would not surprise me that UFS is
>> more efficient in terms of its interactions with the VM.
>>
> This is lots of forks + execs and page queue has only one lock.
>
> Fwiw, ufs world backed by dedidacted parition, other conditions unchanged
>
> First build:
>       52       411     3327572    36528102     8508545      0      4  0 2587337 kern/sys_pipe.c:1438 (sleep mutex:pipe mutex)
>   308102    308099   430348158    36163333   203246241      2      0  0 459200 kern/vfs_subr.c:2176 (lockmgr:ufs)
>       78       818    44308162    29968440   165009282      0      0  0 22053854 vm/vm_page.c:1502 (sleep mutex:vm page free queue)
>       48       238    18872405    28456578   164327020      0      0  0 22512151 vm/vm_page.c:2294 (sleep mutex:vm page free queue)
>      208      1486    69780863    17484511   144970085      0      0  0 134429 amd64/amd64/pmap.c:4204 (rw:pmap pv global)
>       52      1263    19390169     8186840   142392234      0      0  0 11151398 vm/vm_page.c:2053 (sleep mutex:vm active pagequeue)
>       27      1801    19754927     8092312   142403798      0      0  0 9993164 vm/vm_page.c:2097 (sleep mutex:vm active pagequeue)
>     1145      1300    19872450     7158951     7690094      2      0  0 269930 vm/vm_fault.c:785 (rw:vm object)
>    25579      1636    10955717     5962336    12620413      0      0  0  96691 vm/vm_map.c:2883 (rw:vm object)
>       39     54994     1428266     5141221      368787      3     13  0  13391 kern/kern_exec.c:590 (lockmgr:ufs)
>   311470    311513    67241105     3971357    30997102      2      0  0   5094 kern/vfs_lookup.c:509 (lockmgr:ufs)
>       55       568    64300400     3821394   214921016      0      0  0 2699822 kern/vfs_cache.c:487 (rw:Name Cache)
>       14      2224       31486     3784823      266762      0     14  0  43516 amd64/amd64/pmap.c:3767 (rw:pmap pv global)


This is pmap_remove_all, which is the only amd64 pmap function that
still acquires the global lock in exclusive mode.  This entry shows that
we're calling pmap_remove_all about 80% more often than we used to and
this is causing pmap_enter (line 4204) above to wait.

I'm curious to see what the profile looks like if you change this
acquisition to a read acquire.  Such a change hasn't been committed
because of a race involving the completion of TLB shootdown.  However,
it's extremely improbable that the race will come out badly for a
parallel "buildworld".


>       15      1179      184521     2651974     2126398      0      1  0  31955 vm/vm_object.c:646 (rw:vm object)
>    43267     54635    56190811     2228815      368787    152      6  0  10399 kern/imgact_elf.c:829 (lockmgr:ufs)
>     1802      3276    55104622     2042552   142404165      0      0  0 543434 vm/vm_fault.c:997 (sleep mutex:vm page)
>       29        54    50897307     1792095   199350363      0      0  0 2305326 kern/vfs_cache.c:668 (sleep mutex:vnode interlock)
>     1051      1252     3640803     1792074     1030592      3      1  0  18897 vm/vm_object.c:1703 (rw:vm object)
>       17      1389      269455     1778764     2828515      0      0  0 106843 vm/vm_object.c:1222 (rw:vm object)
>      472      1444    14742782     1731247     6851167      2      0  0  20011 amd64/amd64/pmap.c:3620 (rw:pmap pv global)
>
> reset + rm -r /usr/obj/*
>
>   230791    100799   405140208    51092446   203312369      1      0  0 472299 kern/vfs_subr.c:2176 (lockmgr:ufs)
>      188      8046    69993327    41591950   144943809      0      0  0 175226 amd64/amd64/pmap.c:4204 (rw:pmap pv global)
>       76       647    45324624    35204587   164788099      0      0  0 22749884 vm/vm_page.c:1502 (sleep mutex:vm page free queue)
>       40       237    19428445    33209934   164266257      0      0  0 22973810 vm/vm_page.c:2294 (sleep mutex:vm page free queue)
>       52       298     3038837    32813929     8418613      0      3  0 2587957 kern/sys_pipe.c:1438 (sleep mutex:pipe mutex)
>     1939      3216    19524902    14623470     7688013      2      1  0 274392 vm/vm_fault.c:785 (rw:vm object)
>    21696      2861    11773227    13246477    12617163      0      1  0 106342 vm/vm_map.c:2883 (rw:vm object)
>       54      7664     1445280    10079322      368785      3     27  0  15598 kern/kern_exec.c:590 (lockmgr:ufs)
>       22      2736    19935005     9087069   142381462      0      0  0 10214061 vm/vm_page.c:2097 (sleep mutex:vm active pagequeue)
>       36       885    19540659     8825819   142378309      0      0  0 11305041 vm/vm_page.c:2053 (sleep mutex:vm active pagequeue)
>       15      3045       33404     7922947      267066      0     29  0  48667 amd64/amd64/pmap.c:3767 (rw:pmap pv global)
>       13      2605      194977     6294921     2126083      0      2  0  36096 vm/vm_object.c:646 (rw:vm object)
>       82      2269      269861     4506215     2827953      0      1  0 107885 vm/vm_object.c:1222 (rw:vm object)
>     2179      6501    82846723     4123049      368785    224     11  0  13145 kern/imgact_elf.c:829 (lockmgr:ufs)
>       81       722    64614191     4099711   214861181      0      0  0 2763484 kern/vfs_cache.c:487 (rw:Name Cache)
>     3925      2851     3900231     3909417     1031608      3      3  0  22052 vm/vm_object.c:1703 (rw:vm object)
>      483      3335    14875079     3554944     6850151      2      0  0  24182 amd64/amd64/pmap.c:3620 (rw:pmap pv global)
>      187       129   103680324     2146404   250760755      0      0  0 1630451 amd64/amd64/pmap.c:5362 (rw:pmap pv list)
>    30468      2402   126229130     2042062   158088931      0      0  0  71708 vm/vm_object.c:517 (rw:vm object)
>       28        76    51334143     1892757   199327513      0      0  0 2384837 kern/vfs_cache.c:668 (sleep mutex:vnode interlock)
>
>
Received on Thu Mar 19 2015 - 17:02:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:56 UTC