Re: freebsd-head: suddenly NMI panics lead to ddb being unable to stop CPUs?

From: Julian Elischer <julian_at_freebsd.org>
Date: Fri, 21 Aug 2015 23:30:10 +0800
On 8/21/15 11:25 PM, Adrian Chadd wrote:
> Ah, cool. I'll give it a whirl.
>
> I'm a little worried about having all of the other cores spinning in
> this case (mostly thermal; the machines get VERY LOUD when the CPUs
> are spinning..)
>
make each spin with the pause instruction.. and for N seconds (N being 
the CPU ID) or something

> -a
>
>
> On 21 August 2015 at 08:19, Eric van Gyzen <vangyzen_at_freebsd.org> wrote:
>> I mentioned this to Adrian, but I'll mention here for everyone else's benefit.
>>
>> Ryan is exactly right.  There was a thread a while ago, with a proposed patch from Kostik:
>>
>> https://lists.freebsd.org/pipermail/freebsd-arch/2014-July/015584.html
>>
>> As I recall, Scott Long also ran into this a few months ago.
>>
>> It happens for any NMI:  entering the debugger, a PCI Parity or System Error, a hardware watchdog timeout, and probably other sources I'm not remembering.
>>
>> Eric
>>
>> On 08/21/2015 09:23, Ryan Stone wrote:
>>> I have seen similar behaviour before.  The problem is that every CPU
>>> receives an NMI concurrently.  As I recall, one of them gets some kind of
>>> pseudo-spinlock and tries to stop the other CPUs with an NMI.  However,
>>> because they are already in an NMI handler, they don't get the second NMI
>>> and don't stop properly.
>>>
>>> The case that I saw actually had to do with a panic triggered by an NMI,
>>> not entering the debugger, but I believe that both cases use
>>> stop_cpus_hard() under the hood and have a similar issue.
>>>
>>> (I also recall seeing the exact situation that you describe while
>>> originally developing SR-IOV on an alpha version of the Fortville hardware
>>> and firmware with a very buggy SR-IOV implementation.  I've never seen it
>>> on ixgbe before, although I haven't used SR-IOV there very much at all)
>>>
>>>
>>> On Thu, Aug 20, 2015 at 6:15 PM, Adrian Chadd <adrian_at_freebsd.org> wrote:
>>>
>>>> Hi!
>>>>
>>>> This has started happening on -HEAD recently. No, I don't have any
>>>> more details yet than "recently."
>>>>
>>>> Whenever I get an NMI panic (and getting an NMI is a separate issue,
>>>> sigh) I get a slew of "failed to stop cpu" messages, and all CPUs
>>>> enter ddb. This is .. sub-optimal. Has anyone seen this? Does anyone
>>>> have any ideas?
>>>>
>>>>
>>>> -adrian
> _______________________________________________
> freebsd-arch_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe_at_freebsd.org"
>
Received on Fri Aug 21 2015 - 13:30:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:59 UTC