Seeing system-lockups on recent current

From: Garance A Drosihn <drosih_at_rpi.edu>
Date: Fri, 10 Oct 2003 14:05:40 -0400
For the past week or so, I have been having a frustrating time
with my freebsd-current/i386 system.  It is a dual Athlon
system.  It has been running -current just fine since December,
with me updating the OS every week or two.  I did not update it
for most of September, and then went to update it to pick up
the recent round of security-related fixes.

My first update run picked up a change which caused system
panics.  Other people were also seeing that panic, and it
wasn't long before updates were committed to current to fix
that problem.  However, ever since then my -current system
has very frequently locked up.  Totally locked.  The only way
to get it back is a hardware reset.

I have rebuilt the system at least a dozen times since then.
I have built it with snapshots of /usr/src from Sept 12th
to Oct 8th (which is what it's running at the moment).  I
have dropped back to a single-CPU kernel.  I turned off X
(in /etc/ttys) so that doesn't start up at all.  All those
attempts to get a reliable 5.x-system have not worked.
Sometimes the system will crash in the middle of a buildworld,
other times it will crash while it's basically idle and the
monitor is turned off.  One time it crashed in the middle of
an installworld -- right when it was replacing /lib files.
Boy was that a headache to recover from!

On the same PC, in a different DOS partition, is a 4.x-stable
system.  If I boot into 4.x, I have no problems.  I fire up
all the servers that I run, start buildworlds, run cvsup's,
and even had all the 5.x partitions mounted and was running
a infinite-loop that MD5'd every file in the 5.x system.  I
had all of that going on at the same time, and the system is
fine.  While in the 4.x system, I've removed /usr/src on the
5.x system and recreated it, just in case there were some
files corrupted in there.  And once the problems started, I
made a point of always removing all of /usr/obj/usr/src
before starting the buildworld, in case there were corrupted
files in there.

I still have a few things I want to try.  And I know it could
still be a hardware problem (although it bugs me that it fails
so consistently on 5.x and never fails on 4.x).  Perhaps it
is just some disk-corruption problem that occurred during the
first few panics.  But I thought I'd at least mention it, and
see if anyone else has been having similar problems.

-- 
Garance Alistair Drosehn            =   gad_at_gilead.netel.rpi.edu
Senior Systems Programmer           or  gad_at_freebsd.org
Rensselaer Polytechnic Institute    or  drosih_at_rpi.edu
Received on Fri Oct 10 2003 - 09:05:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:24 UTC