Re: SU+J systems do not fsck themselves

From: Matthias Andree <matthias.andree_at_gmx.de>
Date: Wed, 28 Dec 2011 17:42:53 +0100
Am 27.12.2011 22:53, schrieb David Thiel:
> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 
> 9-CURRENT on ppc) running SU+J that have had unexplained panics and 
> crashes start happening relating to disk I/O. When I end up running a 
> full fsck, it keeps turning out that the disk is dirty and corrupted, 
> but no mechanism is in place with SU+J to detect and fix this. A bgfsck 
> never happens, but a manual fsck in single-user does indeed fix the 
> crashing and weird behavior. Others have tested their SU+J volumes and 
> found them to have errors as well. This makes me super nervous.

The one thing I figured is that in the light of power outages, or
crashing virtualization hosts, you really really really need to disable
disk write caches, and this affects softupdates, journalling, asynch
file systems, just about everything.

The fact that makes matters worse is that journalling or softupdates
allow you to mount a silently-corrupted file system, whereas the
traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the
foreground, so they get fixed before the FS panics.

So can you be sure that:

- your driver, chip set and hard disk execute ordered writes in order,

- your driver, chip set and hard disk actually write data to permanent
storage BEFORE acknowledging a successful write?

Whenever I fixed these issues, I had no more corruptions.

For ata and sata, there are loader tunables you will want to set,
hw.ata.wc=0 and kern.cam.ada.write_cache=0.

If your drives are under ada, ad, or ahci related control, try these
settings.  For SCSI, use camcontrol to turn the write cache off.
softupdates is supposed to rectify most of the performance penalties
incurred.

Note also that you needed to set ahci_load=YES and atapicam_load=YES in
8.X, I've never bothered to check 7.X or 9.X WRT these settings.
Received on Wed Dec 28 2011 - 15:42:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC