Re: Hot Swapping CPUs?

From: Scott W <wegster_at_mindcore.net>
Date: Fri, 02 Jan 2004 19:42:16 -0500
Andre Guibert de Bruet wrote:

>On Fri, 2 Jan 2004, Tom wrote:
>
>  
>
>>On Fri, 2 Jan 2004, Oliver Brandmueller wrote:
>>
>>    
>>
>>>to the base functionality for CPU hot swapping. One would need (apart
>>>from a Motherboard that's able to that ;-)) some control over CPUs now,
>>>like disabling a physical CPU during runtime (which could also be done
>>>automatically on certain filure onditions).
>>>      
>>>
>>  sysctl already exposes some variabled to control CPUs on an SMP system.
>>
>>  It is a pretty hard to detect a CPU failure in software, because the
>>software detection will fail at the same time the CPU does.
>>    
>>
>
>Right. A hardware watchdog is what's required to effectively check for
>fail{ed,ing} cpus. With software, you're left either polling or guessing
>that a cpu has gone offline for hardware reasons when it doesn't run
>anything on its queue.
>
>  
>
>>>Is there any work in progress in this direction? Would be a very neat
>>>feature for high availability systems.
>>>      
>>>
>
>Find me a x86 motherboard (with specs, preferably) that supports cpu
>failure-monitoring and hot-swapping and I'll volunteer time to hack up
>some code for you. (We have a need for this functionality in our x86-farm
>at work, so I'd get to do it on the clock. :) )
>
>  
>
Some of the higher end IBM x86 systems are supposed to be able to do 
this, although note that they are all systems equipped with integrated 
(or additional) service processors (AKA Remote Supervisor Adapters).  
Some of the service processor setups can be accessed via serial or rs485 
management ports, and their monitors(CPU, Mem, disk status, temps, fans, 
voltages) are monitored as well via IBM Director (software).  I don't 
know offhand if any of the service processor libraries are freely 
available in source form or not- I _believe_ they are, as I think you 
can build them from a source RPM for RedHat and/or SuSe Linux systems.

I can verify that Director does monitor CPUs via the service processor 
library.  I'm fairly certain they can't hot swap the CPUs, but can't 
confirm or deny if these systems can actually have a CPU fail and keep 
running successfully or not. 

Some of the systems with this functionality:
x345 (2 way Xeon)
x330 (2 way Xeon)
x445 (4-8 way Xeons)

Scott
Received on Fri Jan 02 2004 - 15:42:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:36 UTC