Re: head -r331499 amd64/threadripper panic in vm_page_free_prep during "poudriere bulk -a", after 14h 22m or so.

From: Mark Millard <marklmi26-fbsd_at_yahoo.com>
Date: Sun, 25 Mar 2018 12:32:09 -0700
On 2018-Mar-25, at 11:34 AM, Mark Johnston <markj at FreeBSD.org> wrote:

> On Sun, Mar 25, 2018 at 10:41:38AM -0700, Mark Millard wrote:
>> FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
>> would get the "unnecessary swapping" problem in my UFS-only context,
>> -r331499 (non-debug but with symbols), under Hyper-V. This is a
>> Ryzen Threadripper context, but I've no clue if that is important
>> to the problem. This was after 14 hours or so of building:
>> 
>> . . .
>> [14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
>> [14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
>> 
>> So I've no clue if or how to repeat this.
>> 
>> Unfortunately dump was unsuccessful. 
> 
> What happened?

It reported:

(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 24 37 c7 00 00 0 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.

** DUMP FAILED (ERROR 5) **
= 0x5

>> So all I have is the
>> backtrace. Hand typed from a screen shot of the console
>> window:
> 
> Do you know what the panic message was? There are multiple calls to
> panic() in vm_page_free_prep().

No. I listed what I could see. The console screen does not have many
lines or rows and I was sleeping when the panic happened.

I redid a buildworld buildkernel installkernel installworld sequence
since then and it looks like the detailed addresses changed (as seen
in objdump now vs. what was on the console). But the relative offset
in vm_page_free_prep seem to be a match, at least for the instruction
after the "callq panic".

Looking at the kernel code I see:

. . .
<vm_page_free_prep+0x10> mov    0xffffffff81843690,%rax
<vm_page_free_prep+0x18> mov    $0xffffffff81d6d880,%rcx
<vm_page_free_prep+0x1f> sub    %rcx,%rax
<vm_page_free_prep+0x22> addq   $0x1,%gs:(%rax)
<vm_page_free_prep+0x27> mov    0x54(%rbx),%eax
<vm_page_free_prep+0x2f> and    $0x1,%eax
<vm_page_free_prep+0x32> jne    <vm_page_free_prep+0x15a>
. . .
(several paths reach +0x106)
<vm_page_free_prep+0x106> movw   $0x0,0x64(%rbx)
<vm_page_free_prep+0x10c> cmpl   $0x0,0x50(%rbx)
<vm_page_free_prep+0x110> jne    <vm_page_free_prep+0x163>
. . .
<vm_page_free_prep+0x15a> mov    $0xffffffff8116628b,%rdi
<vm_page_free_prep+0x161> jmp    <vm_page_free_prep+0x16a>
<vm_page_free_prep+0x163> mov    $0xffffffff8120ca97,%rdi
<vm_page_free_prep+0x16a> xor    %eax,%eax
<vm_page_free_prep+0x16c> mov    %rbx,%rsi
<vm_page_free_prep+0x16f> callq  <panic>
<vm_page_free_prep+0x174> nopw   %cs:0x0(%rax,%rax,1)

No KASSERTS present (a non-debug build). That leaves:

        if (vm_page_sbusied(m))
                panic("vm_page_free: freeing busy page %p", m);
and:

        if (m->wire_count != 0)
                panic("vm_page_free: freeing wired page %p", m);

I do not have anything that lets me differentiate which
occurred based on the above detail. Sorry.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Sun Mar 25 2018 - 17:42:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC