RE: Hot Swapping CPUs?

From: Andre Guibert de Bruet <andy_at_siliconlandmark.com>
Date: Fri, 2 Jan 2004 15:14:08 -0500 (EST)
On Fri, 2 Jan 2004, Jeff Jirsa wrote:

> [ I can't send to the list, since this location lacks RDNS, but feel
> free to send followups to the list if you feel they're of use ]
>
> > Find me a x86 motherboard (with specs, preferably) that supports cpu
> > failure-monitoring and hot-swapping and I'll volunteer time to hack up
> > some code for you. (We have a need for this functionality in
> > our x86-farm at work, so I'd get to do it on the clock. :) )
>
> Most (all?) of the IBM eSeries servers have 'Predictive Failure
> Analysis'... It claims to support real-time failure prediction, but I'm
> relatively sure it's not even close to hot-swappable at the CPU level
> (PCI-X cards hot-swap fine, though). You'll just have to figure out how
> to tap into the ISMP the same way IBM Director Agent does to find out
> when a CPU fails, and then a sysctl to disable that CPU would indeed be
> a nice touch.

Powering down a CPU and removing it from the available AP list on first
sign of a problem would be a very nice start. It would prevent a hard
lockup and let the system run until qualified support staff can arrive on
site with a replacement part. Hot-swapping a CPU (or CPU board) as done on
Sun Enterprise servers would be really nice but not crutial. As I see it,
the problem that we're trying to address is the downtime between 3AM when
you've realized that a CPU on your production online system has failed and
7AM when the system vendor's 4hr response team shows up. Powering down a
system for a proc replacement causes a 5 minute downtime window which will
still let you maintain 99.99904% availability (based on a 365.25 day
year).

> Specs? May be available, IBM loves cuddling up to the Linux community.

I'll check IBM's ftp site for details.

Dell's OpenManage Client lets one have access to the health information of
a system. This includes voltages and speeds of processors, fans, memory,
etc. I'll look into the availability of specs from Dell for this.

If anyone has any docs on related material from HP or whomever and doesn't
require me to sign an NDA to have access to it, now would be a good time
to share them. :)

Regards,

> Andre Guibert de Bruet | Enterprise Software Consultant >
> Silicon Landmark, LLC. | http://siliconlandmark.com/    >
Received on Fri Jan 02 2004 - 11:14:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:36 UTC