Re: ZFS melting under postgres...

From: Scott Long <scottl_at_samsco.org> Date: Sun, 16 Dec 2007 10:07:23 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:24 UTC

Darren Reed wrote:
> Bernd Walter wrote:
> ...
>> One problem is with the data blocks beeing that big, when writing
>> 512 Byte you effectifly do a read-modify-write of a larger physical
>> block.
>> This can be handled quite well with larger FS block.
>> The much bigger problem is with power loss when writing such a
>> maintenence block.
>> You loose a very large area of logical blocks when this fails,
>> since a 4k maintenence block contains the allocation for several hundert
>> kB of logical data blocks.
>> In other words - you possibly loose data blocks that were not written
>> a long time and the database wouldn't expect a problem with that data.
>> Even for ZIL it is very questionable if you loose a large data area,
>> since the purpose is to have the data that was already sinced readable
>> after a power loss.
> ...
> 
> ZFS doesn't suffer from this problem because the design
> is to always write a new section of data rather than
> over write "current" data.
> 
> So if you lose power in the middle of a write to a data
> block, there is no damage to the old data.

... except with disks that write sectors via read-update-write on whole 
tracks at a time (i.e. all SATA/ATA disks and probably more and more 
SAS/SCSI disks as well these days).  The speed and density optimizations
that have been introduced to disks in the past 10 years don't come for
free; they directly impact reliability.  That's why you don't ever, ever
want to loose power to a disk subsystem that you consider critical.

Scott