Re: head -r331499 amd64/threadripper panic in vm_page_free_prep during "poudriere bulk -a", after 14h 22m or so.

From: Mark Millard <marklmi26-fbsd_at_yahoo.com>
Date: Sun, 25 Mar 2018 13:15:08 -0700
[Just an added note about where in the sequence panic
messages are sent to the console vs. could potentially
be sent to the console.]

> On 2018-Mar-25, at 12:32 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote:
> 
> On 2018-Mar-25, at 11:34 AM, Mark Johnston <markj at FreeBSD.org> wrote:
> 
>> On Sun, Mar 25, 2018 at 10:41:38AM -0700, Mark Millard wrote:
>>> FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
>>> would get the "unnecessary swapping" problem in my UFS-only context,
>>> -r331499 (non-debug but with symbols), under Hyper-V. This is a
>>> Ryzen Threadripper context, but I've no clue if that is important
>>> to the problem. This was after 14 hours or so of building:
>>> 
>>> . . .
>>> [14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
>>> [14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
>>> 
>>> So I've no clue if or how to repeat this.
>>> 
>>> Unfortunately dump was unsuccessful. 
>> 
>> What happened?
> 
> It reported:
> 
> (da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 24 37 c7 00 00 0 00
> (da1:storvsc1:0:0:0) CAM status Command timeout
> (da1:storvsc1:0:0:0) Error 5, Retries exhausted
> Aborting dump to to I/O error.
> 
> ** DUMP FAILED (ERROR 5) **
> = 0x5
> 
>>> So all I have is the
>>> backtrace. Hand typed from a screen shot of the console
>>> window:
>> 
>> Do you know what the panic message was? There are multiple calls to
>> panic() in vm_page_free_prep().
> 
> No. I listed what I could see. The console screen does not have many
> lines or rows and I was sleeping when the panic happened.

I sometimes wonder if panic should repeat the panic message at the
end of the backtrace in order to deal with keeping it visible in
row-restricted console contexts.

> I redid a buildworld buildkernel installkernel installworld sequence
> since then and it looks like the detailed addresses changed (as seen
> in objdump now vs. what was on the console). But the relative offset
> in vm_page_free_prep seem to be a match, at least for the instruction
> after the "callq panic".
> 
> Looking at the kernel code I see:
> 
> . . .
> <vm_page_free_prep+0x10> mov    0xffffffff81843690,%rax
> <vm_page_free_prep+0x18> mov    $0xffffffff81d6d880,%rcx
> <vm_page_free_prep+0x1f> sub    %rcx,%rax
> <vm_page_free_prep+0x22> addq   $0x1,%gs:(%rax)
> <vm_page_free_prep+0x27> mov    0x54(%rbx),%eax
> <vm_page_free_prep+0x2f> and    $0x1,%eax
> <vm_page_free_prep+0x32> jne    <vm_page_free_prep+0x15a>
> . . .
> (several paths reach +0x106)
> <vm_page_free_prep+0x106> movw   $0x0,0x64(%rbx)
> <vm_page_free_prep+0x10c> cmpl   $0x0,0x50(%rbx)
> <vm_page_free_prep+0x110> jne    <vm_page_free_prep+0x163>
> . . .
> <vm_page_free_prep+0x15a> mov    $0xffffffff8116628b,%rdi
> <vm_page_free_prep+0x161> jmp    <vm_page_free_prep+0x16a>
> <vm_page_free_prep+0x163> mov    $0xffffffff8120ca97,%rdi
> <vm_page_free_prep+0x16a> xor    %eax,%eax
> <vm_page_free_prep+0x16c> mov    %rbx,%rsi
> <vm_page_free_prep+0x16f> callq  <panic>
> <vm_page_free_prep+0x174> nopw   %cs:0x0(%rax,%rax,1)
> 
> No KASSERTS present (a non-debug build). That leaves:
> 
>        if (vm_page_sbusied(m))
>                panic("vm_page_free: freeing busy page %p", m);
> and:
> 
>        if (m->wire_count != 0)
>                panic("vm_page_free: freeing wired page %p", m);
> 
> I do not have anything that lets me differentiate which
> occurred based on the above detail. Sorry.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Sun Mar 25 2018 - 18:15:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC