Re: NMI on shutdown

From: Sean Bruno <sean_bruno_at_yahoo.com>
Date: Sat, 27 Jul 2013 09:33:14 -0700
On Sat, 2013-07-27 at 13:06 +0900, Hiroki Sato wrote:
> Hi,
> 
>  The following log messages are displayed on a box where I am testing
>  stable/9.  It occurs only when trying to shutdown the box:
> 
>  | Waiting (max 60 seconds) for system process `vnlru' to stop...done
>  | Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
>  | Waiting (max 60 seconds) for system process `syncer' to stop...
>  | Syncing disks, vnodes remaining...2 2 0 0 done
>  | All buffers synced.
>  | Uptime: 8d2h4m58s
>  | NMI ISA 30, EISA 0
>  | NMI ISA 20, EISA 0
>  | NMI ... going to debugger
>  | NMI ... going to debugger
>  | NMI ISA 20, EISA 0
>  | NMI ISA 30, EISA 0
>  | NMI ... going to debugger
>  | NMI ... going to debugger
>  | NMI ISA 20, EISA 0
>  ...
>  | timeout stopping cpus
>  | [ thread pid 11 tid 100016 ]
>  | Stopped at      acpi_cpu_c1+0x6:        leave
>  | db> timeout stopping cpus
>  | timeout stopping cpus
>  | [ thread pid 11 tid 100015 ]
>  | Stopped at      acpi_cpu_c1+0x6:        leave
>  | db> timeout stopping cpus
>  | timeout stopping cpus
> 
>  Once these are displayed, a power cycle is required to recover.
>  Shutdown sometimes works, and sometimes not.  DDB prompt does not
>  work because it does not accept entered characters properly.  And,
>  this symptom seems not specific to stable/9.
> 
>  This may be a hardware specific issue, but where should I start to
>  debug this from?
> 
> -- Hiroki


:-)  I just spent the week looking at something that looks like this on
my Dell machines.  In my testing the NMI EISA problem seems to be coming
from the fact that IPMI pokes at both the ACPI and ISA interfaces to the
IPMI controller resulting in an attempt to create /dev/ipmi0
and /dev/ipmi1.

Somewhere in the recent past (affects 9 as well) the ACPI and ISA IPMI
device nodes where children of the same parent and
ipmi_isa.c::ipmi_isa_identitfy() would see the ACPI attachment and do
nothing.  Now the two interfaces have different parents in the device
tree.

On bce(4) based systems, even if not using IPMI, this seems to
crash/confuse older version of the management firmware and yield results
similar to what you see.

I've just commited a tested fix from Yahoo on this that Peter and I
worked out.  see svn R 253708

Sean

Received on Sat Jul 27 2013 - 14:33:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC