Re: Deadlock, exclusive sx so_rcv_sx, amd64

From: Gleb Kozyrev <gkozyrev_at_gmail.com> Date: Fri, 26 Oct 2007 23:02:47 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:20 UTC

On 26/10/2007, Kostik Belousov <kostikbel_at_gmail.com> wrote:
> On Fri, Oct 26, 2007 at 12:22:28PM -0400, John Baldwin wrote:
> > "sbwait" is waiting for data to come in on a socket and "pfault" is
> > waiting on disk I/O. It is a bit odd that 1187 is holding a lock while
> No, pfault means that the process handles page faults, and sleeps
> waiting for some page to become available (either from cache or free list).
>
> > sleeping though that is permitted with an sx lock. Still, if it's
> > supposed to be protect socket's receive buffer that is odd. Maybe get
> > a trace of the process blocked in "sbwait" (tr <pid>) and bug rwatson_at_
> > about it.
>

Thank you Konstantin and John for looking into this.

I didn't do alltrace and friends this time so here's info from
a previous occurence:
http://debug.files.googlepages.com/Serial-20071022-004528.txt

Some of it is below as requested:

db> show alllocks
Process 1406 (systat) thread 0xffffff002aa016a0 (100181)
shared sx allproc r = 0 (0xffffffff80a72820) locked _at_
/usr/src/sys/vm/vm_meter.c:130
exclusive sx sysctl lock r = 0 (0xffffffff80a72f40) locked _at_
/usr/src/sys/kern/kern_sysctl.c:1396
Process 1129 (rtorrent) thread 0xffffff000a6ae000 (100163)
exclusive sx user map r = 0 (0xffffff0001056890) locked _at_
/usr/src/sys/vm/vm_map.c:2195
Process 1105 (sshd) thread 0xffffff000a6ae350 (100162)
exclusive sx so_rcv_sx r = 0 (0xffffff000a78a928) locked _at_
/usr/src/sys/kern/uipc_sockbuf.c:145
Process 1040 (cron) thread 0xffffff0001c61350 (100130)
shared sx proctree r = 0 (0xffffffff80a72860) locked _at_
/usr/src/sys/kern/kern_fork.c:286
Process 998 (mysqld) thread 0xffffff002aa119f0 (100192)
exclusive sx so_rcv_sx r = 0 (0xffffff0003858670) locked _at_
/usr/src/sys/kern/uipc_sockbuf.c:145
Process 998 (mysqld) thread 0xffffff002aa12000 (100191)
exclusive sx so_rcv_sx r = 0 (0xffffff000abbb928) locked _at_
/usr/src/sys/kern/uipc_sockbuf.c:145
Process 932 (smbd) thread 0xffffff000140c000 (100068)
shared sx proctree r = 0 (0xffffffff80a72860) locked _at_
/usr/src/sys/kern/kern_fork.c:286

db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
 1416  1414  1416  1001  S+      zfs:(&tx 0xffffff000152cd90 ftp
 1414  1111  1414  1001  Ss+     pause    0xffffff000abff520 tcsh
 1406  1127  1406  1001  S+      user map 0xffffff0001056890 systat
 1134   932   932  1001  SL      pfault   0xffffffff80b1359c smbd
 1133  1017  1017    80  S       zfs      0xffffff000a399288 httpd
 1129  1113  1129  1001  S+      vmwait   0xffffffff80b1359c rtorrent
 1127  1111  1127  1001  Ss+     pause    0xffffff000a1f90c0 tcsh
 1113  1111  1113  1001  Ss+     pause    0xffffff0001c5b520 tcsh
 1111  1110  1111  1001  SLs     pfault   0xffffffff80b1359c screen
 1110  1108  1108  1001  S+      pause    0xffffff000a1fa980 screen
 1108  1107  1108  1001  Ss+     pause    0xffffff000a6a9520 tcsh
 1107  1105  1105  1001  S       select   0xffffffff80af79d0 sshd
 1105  1033  1105     0  Ss      sbwait   0xffffff000a78a984 sshd
...

Tracing command sshd pid 1105 tid 100162 td 0xffffff000a6ae350
sched_switch() at sched_switch+0x18c
mi_switch() at mi_switch+0x22f
sleepq_switch() at sleepq_switch+0xc7
sleepq_catch_signals() at sleepq_catch_signals+0x25a
sleepq_wait_sig() at sleepq_wait_sig+0x16
_sleep() at _sleep+0x309
soreceive_generic() at soreceive_generic+0x101c
dofileread() at dofileread+0xa1
kern_readv() at kern_readv+0x4c
read() at read+0x54
syscall() at syscall+0x1ce
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (3, FreeBSD ELF64, read), rip = 0x8016f528c, rsp =
0x7fffffffe558, rbp = 0x4 ---

Tracing command rtorrent pid 1129 tid 100163 td 0xffffff000a6ae000
sched_switch() at sched_switch+0x18c
mi_switch() at mi_switch+0x22f
sleepq_switch() at sleepq_switch+0xc7
sleepq_wait() at sleepq_wait+0x44
_sleep() at _sleep+0x351
kmem_malloc() at kmem_malloc+0x2a2
uma_large_malloc() at uma_large_malloc+0x4a
malloc() at malloc+0x12d
arc_get_data_buf() at arc_get_data_buf+0x36e
arc_buf_alloc() at arc_buf_alloc+0xe4
arc_read() at arc_read+0xfa
dbuf_prefetch() at dbuf_prefetch+0x137
dmu_zfetch_dofetch() at dmu_zfetch_dofetch+0x10b
dmu_zfetch() at dmu_zfetch+0x4aa
dbuf_read() at dbuf_read+0x535
dmu_tx_check_ioerr() at dmu_tx_check_ioerr+0x9a
dmu_tx_count_write() at dmu_tx_count_write+0x178
dmu_tx_hold_write() at dmu_tx_hold_write+0x4a
zfs_freebsd_write() at zfs_freebsd_write+0x399
VOP_WRITE_APV() at VOP_WRITE_APV+0x10b
vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x19c
vnode_pager_putpages() at vnode_pager_putpages+0x97
vm_pageout_flush() at vm_pageout_flush+0x14b
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x2d1
vm_object_page_clean() at vm_object_page_clean+0x165
vm_object_sync() at vm_object_sync+0x1f0
vm_map_sync() at vm_map_sync+0x11e
msync() at msync+0x62
syscall() at syscall+0x1ce
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (65, FreeBSD ELF64, msync), rip = 0x8016c37ec, rsp =
0x7fffffffe408, rbp = 0x1 ---

(kgdb) print cnt
$1 = {v_swtch = 0, v_trap = 0, v_syscall = 0, v_intr = 0, v_soft = 0,
v_vm_faults = 0, v_cow_faults = 0, v_cow_optim = 0, v_zfod = 0,
v_ozfod = 0,
  v_swapin = 0, v_swapout = 0, v_swappgsin = 0, v_swappgsout = 0,
v_vnodein = 0, v_vnodeout = 0, v_vnodepgsin = 0, v_vnodepgsout = 0,
v_intrans = 0,
  v_reactivated = 255459, v_pdwakeups = 507, v_pdpages = 34896653,
v_tcached = 19148245, v_dfree = 0, v_pfree = 74993, v_tfree = 0,
v_page_size = 4096,
  v_page_count = 246565, v_free_reserved = 371, v_free_target = 6779,
v_free_min = 1602, v_free_count = 890, v_wire_count = 43890,
v_active_count = 187459,
  v_inactive_target = 10168, v_inactive_count = 14025, v_cache_count =
0, v_cache_min = 6779, v_cache_max = 13558, v_pageout_free_min = 34,
  v_interrupt_free_min = 2, v_free_severe = 986, v_forks = 0, v_vforks
= 0, v_rforks = 0, v_kthreads = 0, v_forkpages = 0, v_vforkpages = 0,
  v_rforkpages = 0, v_kthreadpages = 0}

I managed to find a kernel message that precedes the freeze:
Approaching the limit on PV entries, consider increasing sysctl
vm.pmap.shpgperproc or vm.pmap.pv_entry_max

rtorrent likes to mmap large files.
After increasing (one of) these sysctls everything seems ok.
Probably, setting rtorrent's memory limit could have helped either.

Is there a way to know if some process is becoming close to that limit?

-- 
With best regards, Gleb Kozyrev.