Re: Seeing system-lockups on recent current

From: Cy Schubert <Cy.Schubert_at_komquats.com>
Date: Sun, 12 Oct 2003 09:32:02 -0700
I'm seeing similar lockups, however they started shortly after the new ATA 
code was committed. The lockups usually occur when there's a lot of ATA 
activity, e.g. filesystem or fsck. At the moment I can only guess as to 
what the problem might be (missing interrupt is my most educated guesss) 
but keeping the amount of ATA I/O to a minimum does help the situation. 
Both machines which have suffered the problem have intel chipsets. One is a 
12 year old P120 (I cannot recall the exact chipset) and the other is a 
PIII with an 815E chipset. On a couple of occasions I had systat running 
and noticed that buffers in use climbed until the system just froze, 
responding only to pings. In all cases all filesystems were generally 
"clean" just with the dirty bit set, except for filesystem on an ATA drive 
(/var or /export) which required considerable cleanup. Filesystems that 
reside on SCSI devices have yet to exhibit any symptoms, e.g. requiring 
anything more than resetting the dirty bit.

Due to this problem I've yet to complete a portupgrade, something I've been 
trying to complete over the last four weeks, as it usually hangs the system 
within 12 hours.


Cheers,
--
Cy Schubert <Cy.Schubert_at_komquats.com>        http://www.komquats.com/
BC Government                     .                       FreeBSD UNIX
Cy.Schubert_at_osg.gov.bc.ca         .                     cy_at_FreeBSD.org
http://www.gov.bc.ca/             .            http://www.FreeBSD.org/

In message <p06002013bbac9e8160de_at_[128.113.24.47]>, Garance A Drosihn 
writes:
> For the past week or so, I have been having a frustrating time
> with my freebsd-current/i386 system.  It is a dual Athlon
> system.  It has been running -current just fine since December,
> with me updating the OS every week or two.  I did not update it
> for most of September, and then went to update it to pick up
> the recent round of security-related fixes.
> 
> My first update run picked up a change which caused system
> panics.  Other people were also seeing that panic, and it
> wasn't long before updates were committed to current to fix
> that problem.  However, ever since then my -current system
> has very frequently locked up.  Totally locked.  The only way
> to get it back is a hardware reset.
> 
> I have rebuilt the system at least a dozen times since then.
> I have built it with snapshots of /usr/src from Sept 12th
> to Oct 8th (which is what it's running at the moment).  I
> have dropped back to a single-CPU kernel.  I turned off X
> (in /etc/ttys) so that doesn't start up at all.  All those
> attempts to get a reliable 5.x-system have not worked.
> Sometimes the system will crash in the middle of a buildworld,
> other times it will crash while it's basically idle and the
> monitor is turned off.  One time it crashed in the middle of
> an installworld -- right when it was replacing /lib files.
> Boy was that a headache to recover from!
> 
> On the same PC, in a different DOS partition, is a 4.x-stable
> system.  If I boot into 4.x, I have no problems.  I fire up
> all the servers that I run, start buildworlds, run cvsup's,
> and even had all the 5.x partitions mounted and was running
> a infinite-loop that MD5'd every file in the 5.x system.  I
> had all of that going on at the same time, and the system is
> fine.  While in the 4.x system, I've removed /usr/src on the
> 5.x system and recreated it, just in case there were some
> files corrupted in there.  And once the problems started, I
> made a point of always removing all of /usr/obj/usr/src
> before starting the buildworld, in case there were corrupted
> files in there.
> 
> I still have a few things I want to try.  And I know it could
> still be a hardware problem (although it bugs me that it fails
> so consistently on 5.x and never fails on 4.x).  Perhaps it
> is just some disk-corruption problem that occurred during the
> first few panics.  But I thought I'd at least mention it, and
> see if anyone else has been having similar problems.
> 
> -- 
> Garance Alistair Drosehn            =   gad_at_gilead.netel.rpi.edu
> Senior Systems Programmer           or  gad_at_freebsd.org
> Rensselaer Polytechnic Institute    or  drosih_at_rpi.edu
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 
Received on Sun Oct 12 2003 - 12:31:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:25 UTC