OK, David's patch fixes the umtx thundering herd (and seems to give a 4-6% boost). I also fixed a thundering herd in FILEDESC_UNLOCK (which was also waking up 2-7 CPUs at once about 30% of the time) by doing s/wakeup/wakeup_one/. This did not seem to give a performance impact on this test though. It seems to me that a more useful way to sort the mutex profiling list is by ration of contention to total acquisitions. Here is the list resorted by cnt_hold/count, keeping only the top 40 values of count and the mutexes with nonzero contention: Before: max total count avg cnt_hold cnt_lock ratio name 275 507115 166457 3 907 1348 .005 kern/vfs_bio.c:357 (needsbuffer lock) 310 487209 166460 2 1158 645 .006 kern/vfs_bio.c:315 (needsbuffer lock) 1084 3860336 166507 23 1241 1377 .007 kern/vfs_bio.c:1445 (buf queue lock) 1667 35018604 320038 109 3877 0 .012 kern/uipc_usrreq.c:696 (unp_mtx) 379 2143505 635740 3 10736 37083 .016 kern/sys_socket.c:176 (so_snd) 1503 4311935 502656 8 8664 9312 .017 kern/kern_lock.c:163 (lockbuilder mtxpool) 875 3495175 166487 20 3394 4272 .020 kern/vfs_bio.c:2424 (vnode interlock) 2084 121390320 2880081 42 67339 79525 .023 kern/uipc_usrreq.c:581 (so_snd) 909 1809346 165769 10 4454 9597 .026 kern/vfs_vnops.c:796 (vnode interlock) 277 518716 166442 3 5034 5172 .030 kern/vfs_bio.c:1464 (vnode interlock) 1565 10515648 282278 37 15760 10821 .055 kern/subr_sleepqueue.c:374 (process lock) 492 2500241 634835 3 54003 62520 .085 kern/kern_sig.c:1002 (process lock) 569 335913 30022 11 3262 2176 .108 kern/kern_sx.c:245 (lockbuilder mtxpool) 1378 27840143 320038 86 42183 1453 .131 kern/uipc_usrreq.c:705 (so_rcv) 300 1011100 320045 3 52423 30742 .163 kern/uipc_socket.c:1101 (so_snd) 437 10472850 3200213 3 576918 615361 .180 kern/kern_resource.c:1172 (sleep mtxpool) 2052 46242974 320039 144 80690 80729 .252 kern/uipc_usrreq.c:617 (unp_global_mtx) 546 48160602 3683470 13 1488801 696814 .404 kern/kern_descrip.c:1988 (filedesc structure) 395 13842967 3683470 3 1568927 685295 .425 kern/kern_descrip.c:1967 (filedesc structure) 644 16700212 635731 26 606615 278511 .954 kern/kern_descrip.c:420 (filedesc structure) 384 2863741 635774 4 654035 280340 1.028 kern/kern_descrip.c:368 (filedesc structure) 604 22164433 2721994 8 5564709 2225496 2.044 kern/kern_synch.c:220 (process lock) After: max total count avg cnt_hold cnt_lock ratio name 168 467413 166364 2 1025 2655 .006 kern/vfs_bio.c:357 (needsbuffer lock) 264 453972 166364 2 1688 44 .010 kern/vfs_bio.c:315 (needsbuffer lock) 240 2011519 640106 3 12032 48460 .018 kern/sys_socket.c:176 (so_snd) 425 5394174 514469 10 12838 15343 .024 kern/kern_lock.c:163 (lockbuilder mtxpool) 514 5127131 166383 30 4417 5666 .026 kern/vfs_bio.c:1445 (buf queue lock) 261 199860 38442 5 1405 475 .036 kern/kern_sx.c:245 (lockbuilder mtxpool) 707 174604101 2880083 60 119723 84566 .041 kern/uipc_usrreq.c:581 (so_snd) 126 520485 166351 3 7850 8574 .047 kern/vfs_bio.c:1464 (vnode interlock) 364 1850567 165607 11 8077 22156 .048 kern/vfs_vnops.c:796 (vnode interlock) 499 3233479 166432 19 9258 8468 .055 kern/vfs_bio.c:2424 (vnode interlock) 754 42181810 320038 131 21236 0 .066 kern/uipc_usrreq.c:696 (unp_mtx) 462 21081419 3685605 5 316514 243585 .085 kern/kern_descrip.c:1988 (filedesc structure) 577 12178436 321182 37 28585 21082 .088 kern/subr_sleepqueue.c:374 (process lock) 221 2410704 640387 3 75056 77553 .117 kern/kern_sig.c:1002 (process lock) 309 12026860 3685605 3 468707 331121 .127 kern/kern_descrip.c:1967 (filedesc structure) 299 973885 320046 3 60629 72506 .189 kern/uipc_socket.c:1101 (so_snd) 471 6132557 640097 9 125478 98778 .196 kern/kern_descrip.c:420 (filedesc structure) 737 33114067 320038 103 85243 1 .266 kern/uipc_usrreq.c:705 (so_rcv) 454 5866777 878113 6 240669 364921 .274 kern/kern_synch.c:220 (process lock) 365 2308060 640133 3 183152 142569 .286 kern/kern_descrip.c:368 (filedesc structure) 220 10297249 3200211 3 1117448 1175412 .349 kern/kern_resource.c:1172 (sleep mtxpool) 947 57806295 320040 180 132456 109179 .413 kern/uipc_usrreq.c:617 (unp_global_mtx) filedesc contention is down by a factor of 3-4, with corresponding reduction in the average hold time. The process lock contention coming from the signal delivery wakeup has also gone way down for some reason. unp contention has risen a bit. The other big gain is to sleep mtxpool contention, which roughly doubled: /* * Change the total socket buffer size a user has used. */ int chgsbsize(uip, hiwat, to, max) struct uidinfo *uip; u_int *hiwat; u_int to; rlim_t max; { rlim_t new; UIDINFO_LOCK(uip); So the next question is how can that be optimized? Kris
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:55 UTC