Re: Panic with mca trap

From: John Baldwin <jhb_at_freebsd.org>
Date: Thu, 3 Feb 2011 08:05:31 -0500
On Tuesday, February 01, 2011 11:58:12 am mdf_at_freebsd.org wrote:
> On a piece of hardware trying to verify basic build tests, we got an
> MCA exception that then panic'd the kernel due to WITNESS/INVARIANTS
> interaction.
> 
> panic _at_ time 1296563157.510, thread 0xffffff0005540000: blockable
> sleep lock (sleep mutex) 128 _at_ /build/mnt/src/sys/vm/uma_core.c:1872
> 
> Stack: --------------------------------------------------
> kernel:witness_checkorder+0x7a2
> kernel:_mtx_lock_flags+0x81
> kernel:uma_zalloc_arg+0x256
> kernel:malloc+0xc5
> kernel:mca_record_entry+0x30
> kernel:mca_scan+0xc9
> kernel:mca_intr+0x79
> kernel:trap+0x30b
> kernel:witness_checkorder+0x66
> kernel:_mtx_lock_spin_flags+0xa4
> kernel:witness_checkorder+0x2a8
> kernel:_mtx_lock_spin_flags+0xa4
> kernel:tdq_idled+0xe8
> kernel:sched_idletd+0x5b
> kernel:fork_exit+0x9b
> 
> That's this bit of code in uma_zalloc_arg():
> 
> #ifdef INVARIANTS
>                         ZONE_LOCK(zone);
>                         uma_dbg_alloc(zone, NULL, item);
>                         ZONE_UNLOCK(zone);
> #endif
> 
> 
> I don't know uma(9) well enough to know the best workaround.  Clearly
> there are times we can be in uma_zalloc_arg() and taking a regular
> mutex is not acceptable.  But what to do for the uma_dbg_free() call
> so it's happy, and whether to guard taking the ZONE lock with M_NOWAIT
> or td_critnest > 0 or both is outside my current knowledge.
> 
> I don't expect we'll see this panic again any time soon, but it would
> be nice to fix the story for WITNESS of when an M_NOWAIT allocation
> can be done.

Actually, this is more my fault.  The machine check happened while the 
interrupted thread was already in a critical section (hence the WITNESS 
complaint).  However, it really isn't correct to be calling malloc() from an 
arbitrary exception handler, especially one like MC# which can fire pretty 
much anywhere.  I think instead that we should use malloc() when polling the 
machine check banks, but keep a pre-allocated pool of structures for use with 
MC# exceptions and CMC interrupts and replenish the pool via an asynchronous 
task.

-- 
John Baldwin
Received on Thu Feb 03 2011 - 12:57:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:11 UTC