How to debug whats cause to much __mtx_lock_sleep in system

From: Vitalij Satanivskij <satan_at_ukr.net> Date: Mon, 21 Oct 2013 15:59:49 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:43 UTC

Hello.

Have 10.0-BETA1 #7 r256765  whith terible load's "load averages: 23.31, 30.53, 31"

wich degraded more and more with time. 

Kernel compilied with dtrace support and using script called  hotkernel from DTraceToolkit-0.99 found some stange statistics

zfs.ko`lz4_compress                                      5045   0.2%
kernel`0xffffffff80                                      5185   0.2%
kernel`uma_zalloc_arg                                    5302   0.2%
kernel`bcopy                                             5322   0.2%
kernel`_sx_xlock                                         7310   0.3%
kernel`_sx_xunlock                                       7434   0.3%
zfs.ko`l2arc_feed_thread                                 9797   0.4%
zfs.ko`lzjb_compress                                     9912   0.4%
zfs.ko`list_prev                                        17894   0.7%
kernel`__rw_wlock_hard                                  30522   1.2%
kernel`spinlock_exit                                    31310   1.3%
kernel`acpi_cpu_c1                                     103495   4.1%
kernel`_sx_xlock_hard                                  138743   5.5%
kernel`vmem_xalloc                                     175869   7.0%
kernel`cpu_idle                                        371159  14.8%
kernel`__mtx_lock_sleep                               1345815  53.8%

Theris another same machine with simple data and usage but with old curent r245701 

Which have none problem's with load 

zfs.ko`fletcher_4_native                                 2366   0.1%
kernel`uma_zfree_arg                                     2387   0.1%
zfs.ko`lzjb_decompress                                   2392   0.1%
kernel`__rw_rlock                                        2477   0.1%
zfs.ko`dmu_zfetch                                        2553   0.1%
kernel`bcopy                                             3035   0.1%
kernel`vm_page_splay                                     3089   0.1%
kernel`_mtx_trylock_flags_                               3346   0.2%
kernel`bzero                                             3411   0.2%
kernel`0xffffffff80                                      3665   0.2%
kernel`_sx_xunlock                                       3818   0.2%
kernel`uma_zalloc_arg                                    4216   0.2%
kernel`vmtotal                                           4702   0.2%
kernel`_sx_xlock                                         5117   0.2%
kernel`free                                              5476   0.2%
zfs.ko`lzjb_compress                                     6674   0.3%
kernel`spinlock_exit                                    21590   1.0%
kernel`__mtx_lock_sleep                                 40819   1.9%
kernel`acpi_cpu_c1                                     311077  14.1%
kernel`cpu_idle                                       1639418  74.6%

Both servers have same hardware, same software of cause not system version.

So which way is the right to investigate problem and find resolution?