Re: HP DL380 hangs on reboot

From: Peter Jeremy <PeterJeremy_at_optushome.com.au>
Date: Fri, 15 Oct 2004 22:06:45 +1000
[Upon reflection, my previous e-mail was somewhat brusque.  My apologies]

On Fri, 2004-Oct-15 18:26:18 +1000, Peter Jeremy wrote:
>On Wed, 2004-Oct-13 18:21:56 -0700, Doug White wrote:
>>On Wed, 13 Oct 2004, Peter Jeremy wrote:
>>
>>> On Mon, 2004-Oct-11 19:18:52 -0700, Doug White wrote:
>>> >On Mon, 11 Oct 2004, Peter Jeremy wrote:
>>> >> I have an HP DL380 running 5.3 and it will not reboot from multi-user
>>> >> mode - it hangs after printing "Rebooting..." and needs to be power-
>>> >> cycled (since there's no reset button).
>
>>> I've narrowed it down to loading kernel modules - the problem does not
>
>>How about building them into your kernel instead?
>
>That seems to work.  But it doesn't solve the underlying problem.

Compiling digi(4) into the kernel does appear to solve my immediate
problem.  Thanks for the suggestion Doug.

My remaining concern is that I have been unable to identify the root
cause of the problem.  These machines will be going into customer
sites as the remote access servers and requiring site access to reboot
it is very undesirable.  Since I don't know the real cause of the
problem, I can't be sure that normal activity will not cause the
problem to recur.

>  (I
>have been kldload'ing digi because it originally didn't work when it
>was compiled into the kernel).

When digi(4) was originally added to the tree, it could not be compiled
into the kernel because its attach routines were not compatible with
the kernel initialisation environment.  This problem was resolved a
couple of years ago but I have continued to kldload digi because:
- it worked and saved me from making the (trivial) changes needed to
  build it into the kernel.
- it needs access to a number of Digi BIOS files which are normally
  loaded/unloaded as KLDs.  If Digi is built in, the BIOS file(s)
  need to be built in as well.  (Though in my case, the wasted
  KVA and RAM is irrelevant).
- to date, all the machines are running 4.x and having digi as a kld
  made it easier to fix back-porting errors.  (Re-compiling a module
  is a lot faster than re-compiling the kernel).

>>This could be just stale modules...
>
>Nope.  Kernel and modules were compiled and installed together.  Also
>there was no problem running and the hang was when the kernel asked
>the system to reboot - which is well after any modules have been unloaded.

Adding some printf's shows that the code is getting into cpu_reset_real().
By this time, all modules and subsystems have been shutdown.  All that's
left to do is ask the hardware for a reset.  The only problem is that
the hardware doesn't want to play ball.  Presumably something the kernel
is doing (apparently associated with loading kernel modules) is disturbing
the hardware state so that reset no longer works.

-- 
Peter Jeremy
Received on Fri Oct 15 2004 - 10:06:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:17 UTC