Re: [PATCH] Machine Check Architecture on amd64

From: Suleiman Souhlal <ssouhlal_at_FreeBSD.org>
Date: Tue, 26 Jun 2007 00:10:37 -0700
On Jun 25, 2007, at 11:55 PM, Ed Schouten wrote:

> * Suleiman Souhlal <ssouhlal_at_FreeBSD.org> wrote:
>>  Hi,
>>
>>  I have a simple patch for amd64 that uses the Machine Check
>>  Architecture/Exceptions on most recent x86 CPUs to detect memory  
>> errors:
>>
>>  http://people.freebsd.org/~ssouhlal/testing/mce-20070621.diff
>>
>>  It will report uncorrected and corrected errors (the latter, only  
>> if sysctl
>>  machdep.mce.log_corrected=1).
>>  You can ask the kernel to panic if it gets an uncorrected error  
>> by setting
>>  machdep.mce.panic_on_uc=1.
>>  All this can be disabled by setting the machdep.mce.enable  
>> tunable to 0. I'm
>>  still not sure if I want this enabled by default, as I don't have  
>> any Intel
>>  machines to test this on, but I have tested it on Opteron (both  
>> corrected
>>  and uncorrected errors).
>>
>>  I would appreciate it if someone would try this, especially if  
>> you have
>>  Intel machines with bad RAM.
>>
>>  Comments are welcome.
>
> |	/*
> |	 * Uncorrected MCEs will generate a #MC, while corrected
> |	 * don't, so we have to periodically poll for them.
> |	 */
>
> What about adding an option to only print uncorrected MCE's? That's  
> the
> most interesting data and we can get that without using a kthread,
> right?

sysctl machdep.mce.log_corrected=0 machdep.mce.poll_delay=0 will stop  
reporting the corrected errors and will stop the kthread (but won't  
actually kill it (I guess I'll fix that before I commit the patch)).

Thanks,
-- Suleiman
Received on Tue Jun 26 2007 - 05:10:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:13 UTC