Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot [backtraces added]

From: Mark Millard <markmi_at_dsl-only.net>
Date: Tue, 12 Dec 2017 17:45:59 -0800
[Just adding back traces. ffs_mount
use of uma_zcreate is involved in the
kernel thread, as is uma_reclaim_worker's
use of uma_reclaim_locked in the uma
thread.]

On 2017-Dec-12, at 4:20 PM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Dec-12, at 3:19 PM, Mark Millard <markmi at dsl-only.net> wrote:
> 
>> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz <freebsd.asc at strcmp.org> wrote:
>> 
>>> On 12.12.17, Mark Millard wrote:
>>> 
>>>> I initially jumped from -r326192 to -r326726 and
>>>> ended up with a rpi2 that would normally hang
>>>> somewhere around release APs being displayed.
>>>> (I have had a couple of completed boots but many
>>>> dozens of hung-up attempts.) Both a debug kernel
>>>> and a non-debug kernel hang the same way.
>>>> 
>>>> Bisecting the kernel (holding world -r326726
>>>> constant) showed:
>>>> 
>>>> -r326346 did not hang (nor did before)
>>>> -r326347 and later hung.
>>> 
>>> JFYI, the latest kernel (and world) running at one of my 
>>> RPI2-B is r326631, without any issues.
>> 
>> Interesting. (By the way: My context
>> is with a V1.1 Cortex-A7 based rpi2,
>> not V1.2 and Cortex-A53.)
>> 
>> I've almost always run the root file
>> system being on a USB SSD instead of
>> on mmcsd0 . I wonder if that is
>> somehow involved since it may be
>> unusual. UFS file system.
>> 
>> The USB SSD is on a powered hub that
>> is in turn plugged into the rpi2.
>> 
>> [I had the hang problem before the
>> following and after.]
>> 
>> The mechanism for holding mmcsd0 in
>> failed recently but the ejection
>> mechanism still works. So I hold
>> in mmcsd0 until after I get a USB
>> SSD boot now. (Interrupt boot, unload,
>> boot/autoboot, picks up the kernel
>> from the USB SSD.)
>> 
>> This means that I effectively can
>> not avoid the USB SSD any more
>> unless I get my hands on a different
>> V1.1 rpi2.
> 
> Looks like I'll get my hands on a different
> rpi2 V1.1 in a few days. So I should then
> be able to do reasonable mmcsd0-only
> experiments. At least once I find the time.
> 
> FYI, in case boot details are involved
> in reproducing the problem. . .
> 
> On the mmcsd0 I have /boot/loader.conf with:
> 
> kern.cam.boot_delay="10000"
> vfs.mountroot.timeout="10"
> 
> and /etc/fstab with:
> 
> /dev/ufs/RPI2rootfs     /               ufs     rw,noatime      1 1
> /dev/label/RPI2Aswap    none            swap    sw              0 0
> /dev/label/RPI2Aboot    /boot/msdos     msdosfs rw,noatime      0 0
> 
> where the /dev/ufs/RPI2rootfs was the USB SSD.
> 
> However, I interrupt the loader and unload and
> then boot or autoboot. (But the hangs started
> before this extra sequence was involved.)
> 
> On the USB SSD I have /boot/loader.conf with:
> 
> kern.cam.boot_delay="10000"
> vfs.mountroot.timeout="10"
> 
> and /etc/fstab with:
> 
> /dev/da0p1      /               ufs     rw,noatime      1 1
> /dev/da0p2      none            swap    sw              0 0
> 
> 
> What db> showed does point to things that
> -r326347 involve:
> 
> chain 32:
> thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
> thread 100077 (pid 23, uma) is on a run queue
> 
> But for all I know -r326347 could be depending
> on something required to be true but not correct
> elsewhere in the rpi2 support. I'm not claiming
> that -r326347 is wrong, just that it is involved.
> I've way to little knowledge to claim to know
> what is wrong on the evidence that I have.
> 
> I've not yet tried a bpi-m3 Cortex-A7 context or
> a pine64+ 2GB or rpi3 Cortex-A53 context. Nor
> powerpc64 nor powerpc. At some point I'll get the
> time for one or more of these. I've not had amd64
> problems in this area.
> 
> I may not be able to test the bpi-m3: its barrel
> connector for power seems flaky and it is
> difficult to keep the board powered for long
> periods in recent times.

For this new example -r326347 kernel based
boot-hang:

chain 35:
 thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
 thread 100073 (pid 23, uma) is on a run queue

db> bt
Tracing pid 1 tid 100001 td 0xd429f000
cpu_switch() at cpu_switch+0x18
         pc = 0xc0584f80  lr = 0xc0299ad8 (sched_switch+0x5c0)
         sp = 0xd42a56b8  fp = 0xd42a56e8
sched_switch() at sched_switch+0x5c0
         pc = 0xc0299ad8  lr = 0xc0275f28 (mi_switch+0x258)
         sp = 0xd42a56f0  fp = 0xd42a5718
         r4 = 0x0001f0f5  r5 = 0x00000000
         r6 = 0xd429f000  r7 = 0x0006612d
         r8 = 0x00000000  r9 = 0x00000104
        r10 = 0xc08073f0
mi_switch() at mi_switch+0x258
         pc = 0xc0275f28  lr = 0xc02c1508 (sleepq_switch+0x1c0)
         sp = 0xd42a5720  fp = 0xd42a5748
         r4 = 0xd429f000  r5 = 0xc0947dc4
         r6 = 0xc09a9c68  r7 = 0xc0947dc0
         r8 = 0x00000000  r9 = 0xc09440c0
        r10 = 0x000000f4
sleepq_switch() at sleepq_switch+0x1c0
         pc = 0xc02c1508  lr = 0xc02c1308 (sleepq_wait+0x48)
         sp = 0xd42a5750  fp = 0xd42a5760
         r4 = 0xd429f000  r5 = 0x00000000
         r6 = 0xc09a9c68  r7 = 0xc06b415c
         r8 = 0x00000001  r9 = 0xc09a9c78
        r10 = 0x00000000
sleepq_wait() at sleepq_wait+0x48
         pc = 0xc02c1308  lr = 0xc02748b0 (_sx_slock_hard+0x298)
         sp = 0xd42a5768  fp = 0xd42a57c8
         r4 = 0xc09a9c68  r5 = 0xc0823ac0
         r6 = 0xc06ae146  r7 = 0x00000000
_sx_slock_hard() at _sx_slock_hard+0x298
         pc = 0xc02748b0  lr = 0xc027457c (_sx_slock_int+0x140)
         sp = 0xd42a57d0  fp = 0xd42a57f8
         r4 = 0x00000078  r5 = 0x00000765
         r6 = 0xc09a9c68  r7 = 0x00000765
         r8 = 0xc09a9c78  r9 = 0xd7ad8740
        r10 = 0x00000000
_sx_slock_int() at _sx_slock_int+0x140
         pc = 0xc027457c  lr = 0xc052b024 (uma_zcreate+0x10c)
         sp = 0xd42a5800  fp = 0xd42a5850
         r4 = 0x00000078  r5 = 0xc09a9c68
         r6 = 0xc06df86e  r7 = 0xc06dd153
         r8 = 0x00000000  r9 = 0x00000000
        r10 = 0x00000000
uma_zcreate() at uma_zcreate+0x10c
         pc = 0xc052b024  lr = 0xc050cc90 (ffs_mount+0x80)
         sp = 0xd42a5858  fp = 0xd42a5980
         r4 = 0xd7844000  r5 = 0x00000000
         r6 = 0x00000003  r7 = 0xc09a99c0
         r8 = 0x00000000  r9 = 0xd429f000
        r10 = 0xd828c400
ffs_mount() at ffs_mount+0x80
         pc = 0xc050cc90  lr = 0xc032e1d4 (vfs_donmount+0xeec)
         sp = 0xd42a5988  fp = 0xd42a5b38
         r4 = 0xffffffff  r5 = 0xd74ec120
         r6 = 0xd429f000  r7 = 0x00000000
         r8 = 0x00000000  r9 = 0xc425acd0
        r10 = 0xd828c400
vfs_donmount() at vfs_donmount+0xeec
         pc = 0xc032e1d4  lr = 0xc0330ea8 (kernel_mount+0x70)
         sp = 0xd42a5b40  fp = 0xd42a5b78
         r4 = 0xd82a9000  r5 = 0x00000000
         r6 = 0x00004000  r7 = 0x00000000
         r8 = 0xd87e2050  r9 = 0xd87e2040
        r10 = 0x00000000
kernel_mount() at kernel_mount+0x70
         pc = 0xc0330ea8  lr = 0xc03335dc (parse_mount+0x458)
         sp = 0xd42a5b80  fp = 0xd42a5c60
         r4 = 0xd4263500  r5 = 0xd87e2054
         r6 = 0xd82a9000  r7 = 0x00000000
parse_mount() at parse_mount+0x458
         pc = 0xc03335dc  lr = 0xc0331a44 ($a.2+0x28)
         sp = 0xd42a5c68  fp = 0xd42a5dc8
         r4 = 0xd7845018  r5 = 0x00000000
         r6 = 0xd7845018  r7 = 0xd78404da
         r8 = 0xc06bfbd4  r9 = 0xfffffff7
        r10 = 0xd7836380
$a.2() at $a.2+0x28
         pc = 0xc0331a44  lr = 0xc020a82c (start_init+0x5c)
         sp = 0xd42a5dd0  fp = 0xd42a5e20
         r4 = 0xc06a370c  r5 = 0xc0823ad0
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0xd42a5e48  r9 = 0x00000000
        r10 = 0xd429b000
start_init() at start_init+0x5c
         pc = 0xc020a82c  lr = 0xc0230f10 (fork_exit+0xa0)
         sp = 0xd42a5e28  fp = 0xd42a5e40
         r4 = 0xd429f000  r5 = 0xd429b000
         r6 = 0xc020a7d0  r7 = 0x00000000
         r8 = 0xd42a5e48  r9 = 0x00000000
        r10 = 0x00000000
fork_exit() at fork_exit+0xa0
         pc = 0xc0230f10  lr = 0xc0564c8c (swi_exit)
         sp = 0xd42a5e48  fp = 0x00000000
         r4 = 0xc020a7d0  r5 = 0x00000000
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0x00000000 r10 = 0x00000000
swi_exit() at swi_exit
         pc = 0xc0564c8c  lr = 0xc0564c8c (swi_exit)
         sp = 0xd42a5e48  fp = 0x00000000

db> bt
Tracing pid 23 tid 100073 td 0xd7ad8740
cpu_switch() at cpu_switch+0x18
         pc = 0xc0584f80  lr = 0xc0299ad8 (sched_switch+0x5c0)
         sp = 0xd87dad58  fp = 0xd87dad88
sched_switch() at sched_switch+0x5c0
         pc = 0xc0299ad8  lr = 0xc0275f28 (mi_switch+0x258)
         sp = 0xd87dad90  fp = 0xd87dadb8
         r4 = 0x00000025  r5 = 0x00000000
         r6 = 0xd7ad8740  r7 = 0x000240ea
         r8 = 0x00000000  r9 = 0x00000100
        r10 = 0xc08073f0
mi_switch() at mi_switch+0x258
         pc = 0xc0275f28  lr = 0xc052d9f8 (uma_reclaim_locked+0x200)
         sp = 0xd87dadc0  fp = 0xd87dade8
         r4 = 0x00000003  r5 = 0xc08073f0
         r6 = 0xc08073f0  r7 = 0x00000000
         r8 = 0x00000000  r9 = 0xc0824310
        r10 = 0xc06df86e
uma_reclaim_locked() at uma_reclaim_locked+0x200
         pc = 0xc052d9f8  lr = 0xc052de70 (uma_reclaim_worker+0x4c)
         sp = 0xd87dadf0  fp = 0xd87dae20
         r4 = 0xc09a9c68  r5 = 0xc4242d80
         r6 = 0xc06df86e  r7 = 0x00000000
         r8 = 0x00000100  r9 = 0xc09b3d08
        r10 = 0xc09a9c7c
uma_reclaim_worker() at uma_reclaim_worker+0x4c
         pc = 0xc052de70  lr = 0xc0230f10 (fork_exit+0xa0)
         sp = 0xd87dae28  fp = 0xd87dae40
         r4 = 0xd7ad8740  r5 = 0xd70ce000
         r6 = 0xc052de24  r7 = 0x00000000
         r8 = 0xd87dae48  r9 = 0xd7ada3a0
        r10 = 0xc0824e8c
fork_exit() at fork_exit+0xa0
         pc = 0xc0230f10  lr = 0xc0564c8c (swi_exit)
         sp = 0xd87dae48  fp = 0x00000000
         r4 = 0xc052de24  r5 = 0x00000000
         r6 = 0x7ff6d83f  r7 = 0xd6f583a0
         r8 = 0xc0824dcc r10 = 0xc0824e8c
swi_exit() at swi_exit
         pc = 0xc0564c8c  lr = 0xc0564c8c (swi_exit)
         sp = 0xd87dae48  fp = 0x00000000

db> show thread 100001
Thread 100001 at 0xd429f000:
 proc (pid 1): 0xd429b000
 name: kernel
 stack: 0xd42a4000-0xd42a5fff
 flags: 0x4  pflags: 0x20000000
 state: INHIBITED: {SLEEPING}
 wmesg: umadrain  wchan: 0xc09a9c68 sleeptimo 0. 0 (curr 26. e3fd787f00000000)
 priority: 84
 container lock: sleepq chain (0xc0947dc4)
 last voluntary switch: 26523 ms ago
 last involuntary switch: 26695 ms ago

db> show thread 100073
Thread 100073 at 0xd7ad8740:
 proc (pid 23): 0xd70ce000
 name: uma
 stack: 0xd87d9000-0xd87dafff
 flags: 0x4  pflags: 0x200000
 state: RUNQ
 priority: 84
 container lock: sched lock 3 (0xc0942ec0)
 last voluntary switch: 26694 ms ago

db> show lock 0xc0942ec0
 class: spin mutex
 name: sched lock 3
 flags: {SPIN, RECURSE}
 state: {UNOWNED}


===
Mark Millard
markmi at dsl-only.net
Received on Wed Dec 13 2017 - 00:46:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC