Re: UFS+J panics on HEAD

From: Lev Serebryakov <lev_at_FreeBSD.org>
Date: Thu, 24 May 2012 01:38:53 +0400
Hello, Konstantin.
You wrote 23 мая 2012 г., 17:10:46:

KB> This panic is another protective panic caused by on-disk inconsistent
KB> structures. The bitmap indicated that an inode was free, but actual inode
KB> context suggested that the inode is in use.

KB> I would not worry much about ffs code until known hardware problem on
KB> the machine are fixed.
  Konstantin, it is very sad, that official position of one of FFS
maintainers (according to mailing list activity), is to blame hardware
on every FFS/SU/SUJ inconsistency and do nothing with code.

   According to my experience, we all live in real world, where HDDs
could lost write cache on power failure or kernel panic unrelated to
FFS, UPSes and PSUs fails (which leads to power failures) and HDDs
with turned off write cache are unusable -- because they become slow
like hell without writing cache.

  You could name it "broken hardware," but let face it -- all
non-top-server hardware, everything, but HBAs with battery installed
in double-PSU-equipped cases, are "broken" now in this sense.

 My home server with almost-read-only load crashes due to burned out
PSU twice in last 2 years (and I buy "good" desktop PSUs, in
$150-$200 price range, not china boxes for $30) and I've got one
memory problem in this time period (DIMM was detected and replaced,
but I've got two or three panics before I become sure, that it is
memory problem, because memtest86+ doesn;t find any problems in 12+
hours run). It is good desktop hardware, with good cooling system,
not something low-end, but not server-grade one, of course.

 And after EVERY of such crashes my main storage area (95% read, 5%
write) had dozens of "unexpected SU inconsistences," background fsck
fails to create snapshot and I was need to run foreground fsck for
many hours. It seems, that "async" mount without SU will not be worse
that SU solution!

 And, if you read through mailing lists, you cold find dozens such
reports. And answer almost always is "broken hardware".

 Yes, Ok, it is broken hardware, all right. But we haven't other one!
We need to live with what we have!

 What I want to say: FFS/SU become almost unusable on this hardware.
Protective panic, my ass! Every solution (link this inode to
lost+found and mark as used, mark it as free, etc) is better than
protective panic. One mismatched inode is not the end of the world, it
is even not end of cylinder group, not to say about whole FS,
system could (and must) complain about it, but not panic! Did you hear
term ``self-healing''? It seems, that modern hardware needed better
solution, that "just panic and blame hardware."

 Or should we call FFS officially dead and promote ZFS as only usable
FS on modern FreeBSD now?
-- 
// Black Lion AKA Lev Serebryakov <lev_at_FreeBSD.org>
Received on Wed May 23 2012 - 19:39:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:27 UTC