I ran a modified version of the thr1 stress test on a quad amd64, which spawns 256 threads (using libthr), each of which loops doing getpid() 10000 times. The system spends essentially 100% of the time in the kernel. Profiling with hwpmc shows that the system is spending about 60% time in _mtx_lock_spin(). I made an extremely stupid set of changes to call different functions based on certain different spin lock types (so that the spinning mutex is visible to profiling), and discovered the following: % cumulative self self total time seconds seconds calls ms/call ms/call name 42.0 200076.00 200076.00 0 100.00% _turnstile_lock_spin [1] 20.9 299496.00 99420.00 0 100.00% sched_lock_spin [2] 6.2 328880.00 29384.00 0 100.00% syscall [3] 5.3 354262.00 25382.00 0 100.00% Xfast_syscall [4] 4.4 375267.00 21005.00 0 100.00% bcopy [5] 3.1 389831.00 14564.00 0 100.00% getpid [6] 2.8 403386.00 13555.00 0 100.00% spinlock_enter [7] ... _turnstile_lock_spin comes via the following in subr_turnstile.c: turnstile_lock_spin(&tc->tc_lock); TURNSTILE_PROFILING shows debug.turnstile.max_depth: 1 so it's not overloading chains. Even with only 4 threads (on 4 CPUs) it still thrashes: % cumulative self self total time seconds seconds calls ms/call ms/call name 30.7 133288.00 133288.00 0 100.00% _turnstile_lock_spin [1] 27.0 250336.00 117048.00 0 100.00% sched_lock_spin [2] But running the same test with libpthread shows nearly 0 spin lock contention (and system time is correspondingly much lower): 61.16 real 2.57 user 60.83 sys <-- libpthread 60.27 real 7.57 user 212.28 sys <-- libthr # gprof -l /boot/kernel/kernel kernel.gmon | grep spin time is in ticks, not seconds 0.5 20105.00 100.00 0 100.00% spinlock_exit [17] 0.2 20622.00 37.00 0 100.00% spinlock_enter [26] 0.2 20692.00 35.00 0 100.00% sched_lock_spin [28] What is going on? Kris
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:54 UTC