Re: System, diagnose thyself: auto-documentation for crashes

From: Ivan Voras <ivoras_at_freebsd.org>
Date: Sat, 30 Aug 2008 13:27:06 +0200
Kirk Strauser wrote:
> I was having flaky system problems that were driving me to distraction. 
> Yesterday, I finally got a panic message with an instruction pointer,
> used addr2line to see that the failure was in uma_zfree_internal,
> searched Google, and learned that it was probably due to bad RAM.  Half
> any hour later, memtest86 found the defective stick and the problem was
> solved.
> 
> This led me to thinking, though: the OS already had all the information
> needed to figure out where the problem was.  If there had been an
> explanation inside that function definition, FreeBSD could have
> automatically gone to the file, searched for that explanation, and told
> me why my system had probably crashed.

There's a "small" problem here - to validate something like this you
need an AI or at least an expert system. It's purely coincidence that
you found someone else with bad RAM crashing in the same function and
byitself it doesn't mean anything. The exact same failure could be
caused by almost any serious problem:

* bad CPU or overheating
* bad motherboard/bus
* compiler generating bad code
* simply, a code bug.

The next time someone reads about "crash in uma_zfree_internal" he could
have an overheated CPU and will spend days swapping and testing RAM :)

From the other side, bad RAM can manifest in practically infinite ways,
as you discovered before you hit uma_zfree_internal.


Received on Sat Aug 30 2008 - 09:27:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC