Re: panic: double fault with 11.0-CURRENT r258504

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 27 Nov 2013 22:00:50 +0200
On Wed, Nov 27, 2013 at 11:35:19AM -0800, Don Lewis wrote:
> On 27 Nov, Konstantin Belousov wrote:
> > On Wed, Nov 27, 2013 at 11:02:57AM -0800, Don Lewis wrote:
> >> On 27 Nov, Konstantin Belousov wrote:
> >> > On Wed, Nov 27, 2013 at 10:33:30AM -0800, Don Lewis wrote:
> >> >> On 27 Nov, Konstantin Belousov wrote:
> >> >> > On Wed, Nov 27, 2013 at 09:41:36AM -0800, Don Lewis wrote:
> >> >> >> On 27 Nov, Konstantin Belousov wrote:
> >> >> >> > On Wed, Nov 27, 2013 at 02:49:12AM -0800, Don Lewis wrote:
> >> >> >> >> <http://people.freebsd.org/~truckman/doublefault2.JPG>
> >> >> >> > 
> >> >> >> > What is the instruction at cpu_switch+0x9b ?
> >> >> >> 
> >> >> >> movl 0x8(%edx),%eax
> >> >> > So it is line 176 in swtch.s. Is machine still in ddb, or did you
> >> >> > obtained the core ? If yes, please print out the content of words at
> >> >> > 0xe4f62bb0 + 4, +8 (*), +16. Please print the content of the word at
> >> >> > address (*) + 8.
> >> >> 
> >> >> It is still in ddb.
> >> >> 
> >> >> <http://people.freebsd.org/~truckman/doublefault3.JPG>, though not in
> >> >> the above order.
> >> > Uhm, sorry, I mistyped the last part of the instructions.
> >> > 
> >> > The new thread pointer is 0xd2f4e000, there is nothing incriminating.
> >> > Please print the word at 0xd2f4e000+0x254 == 0xd2f4e254, which would be
> >> > the address of the new thread pcb. It is load from the pcb + 8 which
> >> > faults.
> >> 
> >> 0xf3d44d60
> > Again, the pointer looks fine, and its tail is 0xd60, which is correct for
> > the pcb offset in the last page of the thread stack.
> > 
> > Please do 'show thread 0xd2f4e000' before trying below instructions.
> 
> Ok, see below:
>  
> > What happens if you try to read word at 0xf3d44d68 ?
> 
> Nothing bad ...
> 
> <http://people.freebsd.org/~truckman/doublefault4.JPG>
> 
So the thread structure looks sane, the stack region is in place where
it is supposed to be, all the gathered data looks self-consistent. And,
the access to the faulted address from ddb does not fault.

Thread stacks can only be invalidated when the process is swapped out and
kernel stack is written to swap.  Your thread flags indicate that it is
in memory, and TDF_CANSWAP is not set.  I do not believe that our swapout
code would invalidate stack mapping in such situation, otherwise we would
have too many complaints already.

Just in case, do you use swap on this box ?

And, as the last resort, I do understand that this sounds as giving up,
do you monitor the temperature of the CPUs ? BTW, which CPUs are that,
please show the cpu identification lines from the boot dmesg.

Received on Wed Nov 27 2013 - 19:01:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC