Re: arm64 panic: reaper-related?

From: Andrew Turner <andrew_at_fubar.geek.nz>
Date: Tue, 14 Jul 2020 20:12:41 +0100
> On 13 Jul 2020, at 15:05, Glen Barber <gjb_at_FreeBSD.org> wrote:
> 
> On Mon, Jul 13, 2020 at 01:58:21PM +0000, Glen Barber wrote:
>> Hi,
>> 
>> This morning, one of our arm64 build machines panicked.  It looks like
>> it is somehow reaper-related, but I am not entirely sure.  Backtrace
>> follows.  Any thoughts?  I'm not quite sure where to go from here...
>> Thanks in advance for any input.
>> 
>> db> set $lines 0
>> db> bt
>> Tracing pid 11 tid 100003 td 0xfffffd0001634000
>> db_trace_self() at db_stack_trace+0xf8
>>         pc = 0xffff00000075fdac  lr = 0xffff000000103e78
>>         sp = 0xffff00011eca89b0  fp = 0xffff00011eca89e0
>> 
>> db_stack_trace() at db_command+0x228
>>         pc = 0xffff000000103e78  lr = 0xffff000000103af0
>>         sp = 0xffff00011eca89f0  fp = 0xffff00011eca8ad0
>> 
>> db_command() at db_command_loop+0x58
>>         pc = 0xffff000000103af0  lr = 0xffff000000103898
>>         sp = 0xffff00011eca8ae0  fp = 0xffff00011eca8b00
>> 
>> db_command_loop() at db_trap+0xf4
>>         pc = 0xffff000000103898  lr = 0xffff000000106c0c
>>         sp = 0xffff00011eca8b10  fp = 0xffff00011eca8d30
>> 
>> db_trap() at kdb         pc = 0xffff000000106c0c  lr = 0xffff000000463b0c
>>         sp = 0xffff00011eca8d40  fp = 0xffff00011eca8df0
>> 
>> kdb_trap() at do_el1h_sync+0xf4
>>         pc = 0xffff000000463b0c  lr = 0xffff00000077b448
>>         sp = 0xffff00011eca8e00  fp = 0xffff00011eca8e30
>> 
>> do_el1h_sync() at handle_el1h_sync+0x78
>>         pc = 0xffff00000077b448  lr = 0xffff000000762878
>>         sp = 0xffff00011eca8e40  fp = 0xffff00011eca8f50
>> 
>> handle_el1h_sync() at kdb_enter+0x34
>>         pc = 0xffff000000762878  lr = 0xffff000000463168
>>         sp = 0xffff00011eca8f60  fp = 0xffff00011eca8ff0
>> 
>> kdb_enter() at vpanic+0x1b0
>>         pc = 0xffff000000463168  lr = 0xffff000000417a74
>>         sp = 0xffff00011eca9000  fp = 0xffff00011eca90b0
>> 
>> vpanic() at panic+0x44
>>         pc = 0xffff000000417a74  lr = 0xffff0000004178c0
>>         sp = 0xffff00011eca90c0  fp = 0xffff00011eca9140
>> 
>> panic() at __stack_chk_fail+0x10
>>         pc = 0xffff0000004178c0  lr = 0xffff00000044ab6c
>>         sp = 0xffff00011eca9150  fp = 0xffff00011eca9150
>> 
>> __stack_chk_fail() at putchar+0x2bc
>>         pc = 0xffff00000044ab6c  lr = 0xffff000000469ce8
>>         sp = 0xffff00011eca9160  fp = 0xffff00011eca91e0
>> 
>> putchar() at 0x106
>>         pc = 0xffff000000469ce8  lr = 0x0000000000000106
>>         sp = 0xffff00011eca91f0  fp = 0x0000000000000000
>> 
>> db> show proc 11
>> Process 11 (idle) at 0xfffffd0001630000:
>> state: NORMAL
>> uid: 0  gids: 0
>> parent: pid 0 at 0xffff0000010fae40
>> ABI: null
>> reaper: 0xffff0000010fae40 reapsubtree: 11
>> sigparent: 20
>> vmspace: 0xffff000001109200
>>   (map 0xffff000001109200)
>>   (map.pmap 0xffff0000011092c0)
>>   (pmap 0xffff000001109320)
>> threads: 48
>> 100003                   Run     CPU -1                      [idle: cpu0]
>> 100004                   Run     CPU 1                       [idle: cpu1]
>> 100005                   Run     CPU 2                       [idle: cpu2]
>> 100006                   Run     CPU 3                       [idle: cpu3]
>> 100007                   Run     CPU 4                       [idle: cpu4]
>> 100008                   Run     CPU 5                       [idle: cpu5]
>> 100009                   Run     CPU 6                       [idle: cpu6]
>> 100010                   Run     CPU 7                       [idle: cpu7]
>> 100011                   Run     CPU 8                       [idle: cpu8]
>> 100012                   CanRun                              [idle: cpu9]
>> 100013                   Run     CPU 10                      [idle: cpu10]
>> 100014                   Run     CPU 11                      [idle: cpu11]
>> 100015                   Run     CPU 12                      [idle: cpu12]
>> 100016                   Run     CPU 13                      [idle: cpu13]
>> 100017                   Run     CPU 14                      [idle: cpu14]
>> 100018                   Run     CPU 15                      [idle: cpu15]
>> 100019                   Run     CPU 16                      [idle: cpu16]
>> 100020                   Run     CPU 17                      [idle: cpu17]
>> 100021                   Run     CPU 18                      [idle: cpu18]
>> 100022                   Run     CPU 19                      [idle: cpu19]
>> 100023                   Run     CPU 20                      [idle: cpu20]
>> 100024                   Run     CPU 21                      [idle: cpu21]
>> 100025                   Run     CPU 22                      [idle: cpu22]
>> 100026                   Run     CPU 23                      [idle: cpu23]
>> 100027                   Run     CPU 24                      [idle: cpu24]
>> 100028                   Run     CPU 25                      [idle: cpu25]
>> 100029                   Run     CPU 26                      [idle: cpu26]
>> 100030                   CanRun                              [idle: cpu27]
>> 100031                   Run     CPU 28                      [idle: cpu28]
>> 100032                   Run     CPU 29                      [idle: cpu29]
>> 100033                   Run     CPU 30                      [idle: cpu30]
>> 100034                   Run     CPU 31                      [idle: cpu31]
>> 100035                   Run     CPU 32                      [idle: cpu32]
>> 100036                   Run     CPU 33                      [idle: cpu33]
>> 100037                   Run     CPU 34                      [idle: cpu34]
>> 100038                   Run     CPU 35                      [idle: cpu35]
>> 100039                   Run     CPU 36                      [idle: cpu36]
>> 100040                   Run     CPU 37                      [idle: cpu37]
>> 100041                   Run     CPU 38                      [idle: cpu38]
>> 100042                   Run     CPU 39                      [idle: cpu39]
>> 100043                   Run     CPU 40                      [idle: cpu40]
>> 100044                   Run     CPU 41                      [idle: cpu41]
>> 100045                   Run     CPU 42                      [idle: cpu42]
>> 100046                   Run     CPU 43                      [idle: cpu43]
>> 100047                   Run     CPU 44                      [idle: cpu44]
>> 100048                   Run     CPU 45                      [idle: cpu45]
>> 100049                   Run     CPU 46                      [idle: cpu46]
>> 100050                   Run     CPU 47                      [idle: cpu47]
>> 
>> 
> 
> I should have included this as well...
> 
> db> show panic
> panic: Misaligned access from kernel space!

How reproducible is this? The backtrace and panic messages don’t line up, but that may be related __stack_chk_fail being in the trace. This is called when a stack overflow is detected.

I added more diagnostics to the kernel in r363191. Is it possible to try upgrading the kernel to that?

Andrew
Received on Tue Jul 14 2020 - 17:12:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC