Re: posix_fallocate on ZFS

From: John Baldwin <jhb_at_freebsd.org>
Date: Mon, 12 Feb 2018 09:04:57 -0800
On Saturday, February 10, 2018 01:46:33 PM Garrett Wollman wrote:
> In article
> <CAOtMX2jZr_kvJgOZWeiB-AZ3-7-uUu+UQ3P0nKhGZ0eNRzwMOQ_at_mail.gmail.com>,
> asomers_at_freebsd.org writes:
> 
> >On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen <wjw_at_digiware.nl>
> >wrote:
> 
> >> Is there any expectation that this is going to fixed in any near future?
> 
> >No.  It's fundamentally impossible to support posix_fallocate on a COW
> >filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
> >since the system call is merely advisory.
> 
> I don't think it's true that this is _fundamentally_ impossible.  What
> the standard requires would in essence be a per-object refreservation.
> ZFS supports refreservation, obviously, but not on a per-object basis.
> Furthermore, there are mechanisms to preallocate blocks for things
> like dumps.  So it *could* be done (as in, the concept is there), but
> it may not be practical.  (And ultimately, there are ways in which the
> administrator might manage the system that would defeat the desired
> effect, but that's out of the standard's scope.)  Given the semantic
> mismatch, though, I suspect it's unreasonable to expect anyone to
> prioritize implementation of such a feature.

I don't think posix_fallocate() can be compatible with COW.  Suppose you
do reserve a fixed set of blocks.  That ensures the first write has a
place to write, but not if you overwrite one of those blocks.  You'd have
to reserve another block to maintain the reservation each time you wrote
to a block, or you'd have to have a way to mark a file as not COW.  The
first case isn't really any better than not using posix_fallocate() in the
first place as you are still requiring writes to allocate blocks, and the
second seems a bit fraught with peril as well if the application is
expecting the non-COW'd file to be in sync with other files in the system
since presumably non-COW'd files couldn't be snapshotted, etc.

-- 
John Baldwin
Received on Mon Feb 12 2018 - 17:47:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC