Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Tue, 9 Jun 2020 22:09:53 +0000
Hope you don't mind the top post, but since this is now an update and somewhat
different, I don't think it makes sense to imbed this in the message below.

r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
I thought this was the culprit, since I did 6 cycles of r358097 without a hang.
However, I just got a hang with r358097, but it looks rather different.
The r358097 hang did not have any processes sleeping on btalloc. They
appeared to be waiting on two different locks in the buffer cache.
As such, I think it might be a different problem. (I'll admit I should have
made notes about this one before rebooting, but I was flustrated that
it happened and rebooted before looking at it mush detail.)

Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
(single core i386) with 1.25Gbytes ram when doing kernel builds using
head kernels from this winter. (I also saw one when doing a kernel build
on UFS, so they aren't NFS specific, although easier to reproduce that way.)
After a typical hang, there will be a bunch of processes sleeping on "btalloc"
and several processes holding the following lock:
exclusive sx lock _at_ vm/vm_map.c:4761
- I have seen hangs where that is the only lock held by any process except
   the interrupt thread.
- I have also seen processes waiting on the following locks:
kern/subr_vmem.c:1343
kern/subr_vmem.c:633

I can't be absolutely sure r358098 is the culprit, but it seems to make the
problem more reproducible.

If anyone has a patch suggestion, I can test it.
Otherwise, I will continue to test r358097 and earlier, to try and see what hangs
occur. (I've done 8 cycles of testing of r356776 without difficulties, but that
doesn't guarantee it isn't broken.)

There is a bunch more of the stuff I got for Kostik and Ryan below.
I can do "db" when it is hung, but it is a screen console, so I need to
transcribe the output to email by hand. (ie. If you need something
specific I can do that, but trying to do everything Kostik and Ryan asked
for isn't easy.)

rick



Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem <rmacklem_at_uoguelph.ca> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock _at_ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked _at_ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked _at_ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked _at_ kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked _at_ kern/vfs_bio.c:1663
> exclusive lockmgr ufs (ufs) r = 0 locked _at_ kern/vfs_subr.c:2930
> exclusive lockmgr syncer (syncer) r = 0 locked _at_ kern/vfs_subr.c:2474
> Process 12 (intr) thread 0x.. (1000008)
> exclusive sleep mutex Giant (Giant) r = 0 locked _at_ kern/kern_intr.c:1152
>
> > ps
> - Not going to list them all, but here are the ones that seem interesting...
> 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
> 16 0 0 0 DL (threaded)               [bufdaemon]
> 100069  D  qsleep                      [bufdaemon]
> 100074  D  -                               [bufspacedaemon-0]
> 100084  D  sdflush  0x11923284 [/ worker]
> - and more of these for the other UFS file systems
> 9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
> 8 0 0 0   DL (threaded)               [pagedaemon]
> 100067  D   psleep 0x1e2e95c   [dom0]
> 100072  D   launds 0x1e2e968   [laundry: dom0]
> 100073  D   umarcl 0x12cc720   [uma]
> … a bunch of usb and cam ones
> 100025  D   -           0x1b2ee40  [doneq0]
> …
> 12 0 0 0 RL  (threaded)               [intr]
> 100007  I                                     [swi6: task queue]
> 100008  Run           CPU 0           [swi6: Giant taskq]
> …
> 100000  D   swapin 0x1d96dfc    [swapper]
> - and a bunch more in D state.
> Does this mean the swapper was trying to swap in?
>
> > acttrace
> - just shows the keyboard
> kdb_enter() at kdb_enter+0x35/frame
> vt_kbdevent() at vt_kdbevent+0x329/frame
> kdbmux_intr() at kbdmux_intr+0x19/frame
> taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
> taskqueue_run() at taskqueue_run+0x44/frame
> taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
> ithread_loop() at ithread_loop+0x237/frame
> fork_exit() at fork_exit+0x6c/frame
> fork_trampoline() at 0x../frame
>
> > show all vmem
> vmem 0x.. 'transient arena'
>   quantum: 4096
>   size:  23592960
>   inuse: 0
>   free: 23592960
>   busy tags:   0
>   free tags:    2
>                      inuse        size       free        size
>   16777216   0               0           1            23592960
> vmem 0x.. 'buffer arena'
>   quantum:  4096
>   size:   94683136
>   inuse: 94502912
>   free: 180224
>   busy tags:    1463
>   free tags:      3
>                        inuse      size         free        size
>   16384           2             32768     1             16384
>   32768           39           1277952 1              32768
>   65536            1422       93192192 0           0
>   131072          0             0            1              131072
> vmem 0x.. 'i386trampoline'
>   quantum:      1
>   size:               24576
>   inuse:             20860
>   free:               3716
>   busy tags:     9
>   free tags:      3
>                        inuse      size      free      size
>   32                 1             48        1           52
>   64                  2            208       0           0
>   128                2            280       0            0
>   2048              1            2048     1           3664
>   4096              2            8192     0           0
>   8192              1            10084   0           0
> vmem 0x.. 'kernel rwx arena'
>   quantum:    4096
>   size: 0
>   inuse: 0
>   free:   0
>   busy tags: 0
>   free tags:  0
> vmem 0x.. 'kernel area dom'
>   quantum:  4096
>   size: 56623104
>   inuse: 56582144
>>   free: 40960
>>   busy tags: 11224
>>   free tags: 3
>I think this is the trouble.
>
>Did you tried to reduce kern.maxvnodes ?  What is the default value for
>the knob on your machine ?
The default is 84342.
I have tried 64K, 32K and 128K and they all hung sooner or later.
For the 32K case, I did see vnodes being recycled for a while before it got hung,
so it isn't just when it hits the limit.

Although it is much easier for me to reproduce on an NFS mount, I did see
a hang while doing a kernel build on UFS (no NFS mount on the machine at
that time).

So, I now know that the problem pre-dates r358252 and is not NFS specific.

I'm not bisecting back further to try and isolate the commit that causes this.
(Unfortunately, each test cycle can take days. I now know that I have to do
several of these kernel builds, which take hours each, to see if a hang is going
to happen.)

I'll post if/when I have more, rick

We scaled maxvnodes for ZFS and UFS, might be NFS is even more demanding,
having larger node size.

>                 inuse     size          free              size
>   4096      11091    45428736 0                 0
>   8192       63         516096     0                 0
>   16384     12          196608    0                  0
>   32768      6           196608    0                  0
>   40960      2           81920      1                  40960
>   65536      16         1048576   0                 0
>   94208      1            94208      0                 0
>   110592     1          110592     0                 0
>   131072      15       2441216    0                0
>   262144      15       3997696    0                 0
>   524288      1         524288      0                 0
>   1048576     1        1945600     0                0
> vmem 0x.. 'kernel arena'
>   quantum: 4096
>   size: 390070272
>   inuse: 386613248
>   free: 3457024
>   busy tags:  873
>   free tags:   3
>                    inuse       size         free        size
> 4096           35           143360    1            4096
> 8192           2              16384      2           16384
> 12288         1              12288      0           0
> 16384         30             491520   0           0
> 20480         140           2867200  0          0
> 65536          1              65536     0           0
> 131072        631          82706432 0         0
> 1048576      0              0               1         1339392
> 2097152      27            56623104   1        2097152
> 8388608      1              13774848    0       0
> 16777216    3              74883072   0        0
> 33554432    1              36753408   0        0
> 67108864    1              118276096  0       0
>
> > alltrace
> - I can't face typing too much more, but I'll put a few
>   here that look interesting
>
> - for csh
> sched_switch()
> mi_switch()
> kern_yield()
> getblkx()
> breadn_flags()
> ffs_update()
> ufs_inactive()
> VOP_INACTIVE()
> vinactivef()
> vput_final()
> vm_object_deallocate()
> vm_map_process_deferred()
> kern_munmap()
> sys_munmap()
>
> - For cc
> sched_switch()
> mi_switch()
> sleepq_switch()
> sleepq_timedwait()
> _sleep()
> pause_sbt()
> vmem_bt_alloc()
> keg_alloc_slab()
> zone_import()
> cache_alloc()
> cache_alloc_retry()
> uma_zalloc_arg()
> bt_fill()
> vmem_xalloc()
> vmem_alloc()
> kmem_alloc()
> kmem_malloc_domainset()
> page_alloc()
> keg_alloc_slab()
> zone_import()
> cache_alloc()
> cache_alloc_retry()
> uma_zalloc_arg()
> nfscl_nget()
> nfs_create()
> vop_sigdefer()
> nfs_vnodeops_bypass()
> VOP_CREATE_APV()
> vn_open_cred()
> vn_open()
> kern_openat()
> sys_openat()
>
> Then there are a bunch of these... for cc, make
> sched_switch()
> mi_switch()
> sleepq_switch()
> sleepq_catch_signals()
> sleepq_wait_sig()
> kern_wait6()
> sys_wait4()
>
> - for vnlru
> sched_switch()
> mi_switch()
> sleepq_switch()
> sleepq_timedwait()
> _sleep()
> vnlru_proc()
> fork_exit()
> fork_trampoline()
>
> - for syncer
> sched_switch()
> mi_switch()
> critical_exit_preempt()
> intr_event_handle()
> intr_execute_handlers()
> lapic_handle_intr()
> Xapic_isr1()
> - interrupt
> memset()
> cache_alloc()
> cache_alloc_retry()
> uma_zalloc_arg()
> vmem_xalloc()
> vmem_bt_alloc()
> keg_alloc_slab()
> zone_import()
> cache_alloc()
> cache_alloc_retry()
> uma_zalloc_arg()
> bt_fill()
> vmem_xalloc()
> vmem_alloc()
> bufkva_alloc()
> getnewbuf()
> getblkx()
> breadn_flags()
> ffs_update()
> ffs_sync()
> sync_fsync()
> VOP_FSYNC_APV()
> sched_sync()
> fork_exit()
> fork_trampoline()
>
> - For bufdaemon (a bunch of these)
> sched_switch()
> mi_switch()
> sleepq_switch()
> sleepq_timedwait()
> _sleep()
> buf_daemon()
> fork_exit()
> fork_trampoline()
>
> vmdaemon and pagedaemon are basically just like above,
> sleeping in
> vm_daemon()
> or
> vm_pageout_worker()
> or
> vm_pageout_laundry_worker()
> or
> uma_reclaim_worker()
>
> That's all the typing I can take right now.
> I can probably make this happen again if you want more specific stuff.
>
> rick
>
>
>
>
_______________________________________________
freebsd-current_at_freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
_______________________________________________
freebsd-current_at_freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"

Received on Tue Jun 09 2020 - 20:09:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC