Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Fri, 22 May 2020 23:46:26 +0000
Konstantin Belousov wrote:
>On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> On Wed, May 20, 2020 at 6:04 PM Rick Macklem <rmacklem_at_uoguelph.ca> wrote:
>> >
>> > Hi,
>> >
>> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> > to bisect this, but r358252 seems to be the culprit.
>> >
>> > If I do a kernel build over NFS using my not so big Pentium 4 (single core,
>> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> > If I revert to r358251, I cannot reproduce this.
>> >
>> > Any ideas?
>> >
>> > I can easily test any change you might suggest to see if it fixes the
>> > problem.
>> >
>> > If you want more debug info, let me know, since I can easily
>> > reproduce it.
>> >
>> > Thanks, rick
>>
>> Nothing obvious to me.  I can maybe try a repro on a VM...
>>
>> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>>
>> "btalloc" is "We're either out of address space or lost a fill race."
>
>Yes, I would be not surprised to be out of something on 1G i386 machine.
>Please also add 'show alllocks'.
Ok, I used an up to date head kernel and it took longer to reproduce a hang.
This time, none of the processes are stuck on "btalloc".
I'll try and give you most of the above, but since I have to type it in by hand
from the screen, I might not get it all. (I'm no real typist;-)
> show alllocks
exclusive lockmgr ufs (ufs) r = 0 locked _at_ kern/vfs_subr.c: 3259
exclusive lockmgr nfs (nfs) r = 0 locked _at_ kern/vfs_lookup.c:737
exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked _at_ kern/subr_vmem.c:1343
exclusive lockmgr bufwait (bufwait) r = 0 locked _at_ kern/vfs_bio.c:1663
exclusive lockmgr ufs (ufs) r = 0 locked _at_ kern/vfs_subr.c:2930
exclusive lockmgr syncer (syncer) r = 0 locked _at_ kern/vfs_subr.c:2474
Process 12 (intr) thread 0x.. (1000008)
exclusive sleep mutex Giant (Giant) r = 0 locked _at_ kern/kern_intr.c:1152

> ps
- Not going to list them all, but here are the ones that seem interesting...
18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
16 0 0 0 DL (threaded)               [bufdaemon]
100069  D  qsleep                      [bufdaemon]
100074  D  -                               [bufspacedaemon-0]
100084  D  sdflush  0x11923284 [/ worker]
- and more of these for the other UFS file systems
9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
8 0 0 0   DL (threaded)               [pagedaemon]
100067  D   psleep 0x1e2e95c   [dom0]
100072  D   launds 0x1e2e968   [laundry: dom0]
100073  D   umarcl 0x12cc720   [uma]
… a bunch of usb and cam ones
100025  D   -           0x1b2ee40  [doneq0]
…
12 0 0 0 RL  (threaded)               [intr]
100007  I                                     [swi6: task queue]
100008  Run           CPU 0           [swi6: Giant taskq]
…
100000  D   swapin 0x1d96dfc    [swapper]
- and a bunch more in D state.
Does this mean the swapper was trying to swap in?

> acttrace
- just shows the keyboard
kdb_enter() at kdb_enter+0x35/frame
vt_kbdevent() at vt_kdbevent+0x329/frame
kdbmux_intr() at kbdmux_intr+0x19/frame
taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
taskqueue_run() at taskqueue_run+0x44/frame
taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
ithread_loop() at ithread_loop+0x237/frame
fork_exit() at fork_exit+0x6c/frame
fork_trampoline() at 0x../frame

> show all vmem
vmem 0x.. 'transient arena'
  quantum: 4096
  size:  23592960
  inuse: 0
  free: 23592960
  busy tags:   0
  free tags:    2
                     inuse        size       free        size
  16777216   0               0           1            23592960
vmem 0x.. 'buffer arena'
  quantum:  4096
  size:   94683136
  inuse: 94502912
  free: 180224
  busy tags:    1463
  free tags:      3
                       inuse      size         free        size
  16384           2             32768     1             16384
  32768           39           1277952 1              32768
  65536            1422       93192192 0           0
  131072          0             0            1              131072
vmem 0x.. 'i386trampoline'
  quantum:      1
  size:               24576
  inuse:             20860
  free:               3716
  busy tags:     9
  free tags:      3
                       inuse      size      free      size
  32                 1             48        1           52
  64                  2            208       0           0
  128                2            280       0            0
  2048              1            2048     1           3664
  4096              2            8192     0           0
  8192              1            10084   0           0
vmem 0x.. 'kernel rwx arena'
  quantum:    4096
  size: 0
  inuse: 0
  free:   0
  busy tags: 0
  free tags:  0
vmem 0x.. 'kernel area dom'
  quantum:  4096
  size: 56623104
  inuse: 56582144
  free: 40960
  busy tags: 11224
  free tags: 3
                inuse     size          free              size
  4096      11091    45428736 0                 0
  8192       63         516096     0                 0
  16384     12          196608    0                  0
  32768      6           196608    0                  0
  40960      2           81920      1                  40960
  65536      16         1048576   0                 0
  94208      1            94208      0                 0
  110592     1          110592     0                 0
  131072      15       2441216    0                0
  262144      15       3997696    0                 0
  524288      1         524288      0                 0
  1048576     1        1945600     0                0
vmem 0x.. 'kernel arena'
  quantum: 4096
  size: 390070272
  inuse: 386613248
  free: 3457024
  busy tags:  873
  free tags:   3
                   inuse       size         free        size
4096           35           143360    1            4096
8192           2              16384      2           16384
12288         1              12288      0           0
16384         30             491520   0           0
20480         140           2867200  0          0
65536          1              65536     0           0
131072        631          82706432 0         0
1048576      0              0               1         1339392
2097152      27            56623104   1        2097152
8388608      1              13774848    0       0
16777216    3              74883072   0        0
33554432    1              36753408   0        0
67108864    1              118276096  0       0

> alltrace
- I can't face typing too much more, but I'll put a few
  here that look interesting

- for csh
sched_switch()
mi_switch()
kern_yield()
getblkx()
breadn_flags()
ffs_update()
ufs_inactive()
VOP_INACTIVE()
vinactivef()
vput_final()
vm_object_deallocate()
vm_map_process_deferred()
kern_munmap()
sys_munmap()

- For cc
sched_switch()
mi_switch()
sleepq_switch()
sleepq_timedwait()
_sleep()
pause_sbt()
vmem_bt_alloc()
keg_alloc_slab()
zone_import()
cache_alloc()
cache_alloc_retry()
uma_zalloc_arg()
bt_fill()
vmem_xalloc()
vmem_alloc()
kmem_alloc()
kmem_malloc_domainset()
page_alloc()
keg_alloc_slab()
zone_import()
cache_alloc()
cache_alloc_retry()
uma_zalloc_arg()
nfscl_nget()
nfs_create()
vop_sigdefer()
nfs_vnodeops_bypass()
VOP_CREATE_APV()
vn_open_cred()
vn_open()
kern_openat()
sys_openat()

Then there are a bunch of these... for cc, make
sched_switch()
mi_switch()
sleepq_switch()
sleepq_catch_signals()
sleepq_wait_sig()
kern_wait6()
sys_wait4()

- for vnlru
sched_switch()
mi_switch()
sleepq_switch()
sleepq_timedwait()
_sleep()
vnlru_proc()
fork_exit()
fork_trampoline()

- for syncer
sched_switch()
mi_switch()
critical_exit_preempt()
intr_event_handle()
intr_execute_handlers()
lapic_handle_intr()
Xapic_isr1()
- interrupt
memset()
cache_alloc()
cache_alloc_retry()
uma_zalloc_arg()
vmem_xalloc()
vmem_bt_alloc()
keg_alloc_slab()
zone_import()
cache_alloc()
cache_alloc_retry()
uma_zalloc_arg()
bt_fill()
vmem_xalloc()
vmem_alloc()
bufkva_alloc()
getnewbuf()
getblkx()
breadn_flags()
ffs_update()
ffs_sync()
sync_fsync()
VOP_FSYNC_APV()
sched_sync()
fork_exit()
fork_trampoline()

- For bufdaemon (a bunch of these)
sched_switch()
mi_switch()
sleepq_switch()
sleepq_timedwait()
_sleep()
buf_daemon()
fork_exit()
fork_trampoline()

vmdaemon and pagedaemon are basically just like above,
sleeping in
vm_daemon()
or
vm_pageout_worker()
or
vm_pageout_laundry_worker()
or
uma_reclaim_worker()

That's all the typing I can take right now.
I can probably make this happen again if you want more specific stuff.

rick





Received on Fri May 22 2020 - 21:46:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC