A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot

From: Mark Millard <markmi_at_dsl-only.net>
Date: Tue, 12 Dec 2017 03:10:34 -0800
I initially jumped from -r326192 to -r326726 and
ended up with a rpi2 that would normally hang
somewhere around release APs being displayed.
(I have had a couple of completed boots but many
dozens of hung-up attempts.) Both a debug kernel
and a non-debug kernel hang the same way.

Bisecting the kernel (holding world -r326726
constant) showed:

-r326346 did not hang (nor did before)
-r326347 and later hung.

Conclusion: -r326347 changes are involved in the
boot hangups.

Context: head -r326726 based (from before I did
the bisect). My knowledge is limited so I may not
have picked optimal material to get from the db>
prompt.

It appears that the messages about "random" occur
during the hang (indefinite wait). As I remember,
I have seen examples where the "Trying to mount"
did not show up --but normally it did.

And example from a hung boot:

. . .
Release APs
Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]...
random: unblocking device.
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache

The hang never seems to time out but just sits
there, even for hours.

It allows the <CR>~^B sequence to get to the db>
prompt. I've looked around a little a couple of
times.

One common point is that show allchains has
everything listed as sleeping except:

chain 32:
thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
thread 100077 (pid 23, uma) is on a run queue
chain 33:

(Note: uma's thread number has varied, as has the one
for [pagedaemon].)

bd> ps
 pid  ppid  pgrp   uid  state   wmesg   wchan       cmd
  28     0     0     0  DL      syncer  0xc095a31c  [syncer]
  27     0     0     0  DL      vlruwt  0xd7417730  [vnlru]
  26     0     0     0  DL      psleep  0xc0959c00  [bufdaemon]
  25     0     0     0  RL                          [bufspacedaemon]
  24     0     0     0  DL      psleep  0xc095e8f8  [vmdaemon]
  23     0     0     0  RL      (threaded)          [pagedaemon]
100065                   RunQ                        [pagedaemon]
100076                   D       launds  0xc095e804  [laundry: dom0]
100077                   RunQ                        [uma]
. . .
   1     0     0     0  DL      umadrai 0xc095e528  [kernel]
   0     0     0     0  RLs     (threaded)          [kernel]
100000                   D       swapin  0xc09673c8  [swapper]
. . .
100072                   D       -       0xd75b7e80  [if_io_tqg_1]
100073                   RunQ                        [if_io_tqg_2]
100074                   D       -       0xd75b7d80  [if_io_tqg_3]
100075                   D       -       0xd75b7d00  [if_config_tqg_0]
100078                   D       -       0xd83dc100  [softirq_0]
100079                   D       -       0xd75b7c00  [softirq_1]
100080                   RunQ                        [softirq_2]
100081                   D       -       0xd75b7b00  [softirq_3]

(Which if_io_tqg_<?> and softirq_<?> pair has RunQ varies.)

All RunQ's are shown above. One or two [idle: CPU<?>]'s have
state CanRun and the other [idle: CPU<?>]'s have state RUN.
They are the only items with those states. Example from the
same hangup:

  10     0     0     0  RL      (threaded)          [idle]
100002                   Run     CPU 0               [idle: cpu0]
100003                   Run     CPU 1               [idle: cpu1]
100004                   CanRun                      [idle: cpu2]
100005                   CanRun                      [idle: cpu3]


db> show lock 0xc095e528
class: sx
name: umadrain
state: XLOCK: 0xd6cbe740 (tid 100077, pid 23, "uma")
waiters: shared

db> show thread 100001
Thread 100001 at 0xd40a7000:
proc (pid 1): 0xd40a3000
name: kernel
stack: 0xd40ac000-0xd40adfff
flags: 0x4  pflags: 0x20000000
state: INHIBITED: {SLEEPING}
wmesg: umadrain  wchan: 0xc095e528 sleeptimo 0. 0 (curr 51d. 5eac6a0400000000)
priority: 84
container lock: sleepq chain (0xc0957244)
last voluntary switch: 1297717 ms ago
last involuntary switch: 1297809 ms ago

db> show thread 100077
Thread 100077 at 0xd6cbe740:
proc (pid 23): 0xd6cab000
name: uma
stack: 0xd83ca000-0xd83cbfff
flags: 0x4  pflags: 0x200000
state: RUNQ
priority: 84
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297815 ms ago
last involuntary switch: 1297815 ms ago

db> show thread 100073
Thread 100073 at 0xd7406740:
proc (pid 0): 0xc09673c8
name: if_io_tqg_2
stack: 0xd742a000-0xd742bfff
flags: 0x4  pflags: 0x200000
state: RUNQ
priority: 24
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297818 ms ago

db> show thread 100080
Thread 100080 at 0xd7431ae0:
proc (pid 0): 0xc09673c8
name: softirq_2
stack: 0xd83f5000-0xd83f6fff
flags: 0x4  pflags: 0x200000
state: RUNQ
priority: 24
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297816 ms ago

db> show lock 0xc0952640
class: spin mutex
name: sched lock 2
flags: {SPIN, RECURSE}
state: {UNOWNED}

db> show lock 0xc0957244
class: spin mutex
name: sleepq chain
flags: {SPIN, RECURSE}
state: {UNOWNED}

db> show thread 100065
Thread 100065 at 0xd6cbb000:
proc (pid 23): 0xd6cab000
name: pagedaemon
stack: 0xd7403000-0xd7404fff
flags: 0x14  pflags: 0x20200000
state: RUNQ
priority: 84
container lock: sched lock 1 (0xc0951f80)
last voluntary switch: 1029 ms ago
last involuntary switch: 28606 ms ago

db> show lock 0xc0951f80
class: spin mutex
name: sched lock 1
flags: {SPIN, RECURSE}
state: {UNOWNED}



===
Mark Millard
markmi at dsl-only.net
Received on Tue Dec 12 2017 - 10:10:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC