and replaced by LOCK_PROFILING. - When LOCK_PROFILING is compiled in and enabled the kernel will now profile hold times for all locks (spin mutex, blocking mutex, rwlock, sx lock, and lockmgr). - We now track the wait-to-acquire time, which I believe to be a more useful metric of contention than hold time or number of times contested. - The overhead of having LOCK_PROFILING compiled in but not enabled has been reduced by moving large chunks of code out of line - on the T1 the measured overhead is < 1%. - There is no longer a single mutex for serializing updates to the profiling hash - reducing the locking contention of measuring lock contention. Thanks to DES for the MUTEX_PROFILING implementation and Kris Kennaway for many of the optimizations that made their way into this patch. Please report to me any issues caused by this change. I give some examples of its immediate utility below: I'm running a buildworld that isn't using all the system threads. I sorted on the third column (maximum total wait) - the first is due to the idle threads constantly trying to get work. The third and fourth are from make using select. Looking at kern_select - one sees that it is clearly fairly single-threaded. Oddly enough, makes Job.c already has support for kqueue, but it isn't the default. I defined USE_KQUEUE and select went away as a point of contention during builds. We see here that the page queue mutex is a major point of contention. max total wait_total count avg wait_avg cnt_hold cnt_lock name 24 3566322 1311264691 1358360 2 965 6800925 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/kern_idle.c:121 (sched lock) 29 1218447 414601055 172116 7 2408 533196 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_sleepqueue.c:529 (sched lock) 2 3013 413907132 8359 0 49516 14603 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/sys_generic.c:812 (sched lock) 1027 242236 413829518 14365 16 28808 4462 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/sys_generic.c:776 (sellck) 1894753 787273038 55823553 726605 1083 76 0 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/vfs_default.c:263 (nfs) 253 104799 8583672 204689 0 41 39672 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:844 (vm page queue mutex) 153 264890 3935024 227674 1 17 46885 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:902 (vm page queue mutex) 316 3238931 2650089 227674 14 11 113827 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/sun4v/sun4v/pmap.c:956 (vm page queue mutex) 35 101146 1916077 82252 1 23 16275 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:342 (vm page queue mutex) 4 106600 1665429 285490 0 5 475928 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_sleepqueue.c:318 (sched lock) Here we do a make -j32 of the kernel, so all cpu threads are in use (thus no issues with the idle threads). The turnstile lock contention is likely a result of all the cpu threads contending for the page queue mutex. This could probably be improved by adaptively spinning if the current holder of the mutex is running. Many of page queue mutex acquisitions are merely to protect setting flags in an individual page. In the case of a 32 cpu system having a lock per vm_page would probably be the way to go - however, this would penalize systems with 4 and fewer cpus. Perhaps alc should look into varying the granularity of locking as a function of the number of cpus. max total wait_total count avg wait_avg cnt_hold cnt_lock name 5 7266196 206805560 7522452 0 27 48266619 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/subr_turnstile.c:487 (turnstile chain) 457 528521 180592127 550284 0 328 1469872 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:844 (vm page queue mutex) 15057461 1679582934 117520488 87978 19090 1335 0 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/ufs/ffs/ffs_vnops.c:366 (ufs) 214 1076256 112489341 559032 1 201 1520471 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:902 (vm page queue mutex) 424 8250360 105249196 559031 14 188 1767340 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/sun4v/sun4v/pmap.c:956 (vm page queue mutex) 72563452121 218316084315 94216669 452713 482239 208 0 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/vfs_default.c:263 (nfs) 23 1349030 14049785 280685 4 50 923679 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/kern/kern_idle.c:121 (sched lock) 73 214117 11078161 63944 3 173 2505 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/nfsclient/nfs_socket.c:1235 (Giant) 42 92768 10431233 40012 2 260 122966 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_fault.c:342 (vm page queue mutex) 6429 2155581 6645086 18297 117 363 105550 0 /usr/flatstor/shared/freebsd/kmacy/src/sys/vm/vm_object.c:651 (vm page queue mutex)Received on Sat Nov 11 2006 - 02:47:57 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:02 UTC