Re: effect of strip(1) on du(1)

From: Alan Somers <asomers_at_freebsd.org>
Date: Fri, 3 Mar 2017 09:39:15 -0700
On Fri, Mar 3, 2017 at 7:11 AM, Rodney W. Grimes
<freebsd-rwg_at_pdx.rh.cn85.dnsmgr.net> wrote:
> -- Start of PGP signed section.
> [ Charset ISO-8859-1 unsupported, converting... ]
>> On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes" <freebsd-rwg_at_pdx.rh.CN85.dnsmgr.net> wrote:
>> >> du(1) is using fts_read(3), which is based on the stat(2) information.
>> >> The OpenGroup defines st_blocksize as "Number of blocks allocated for
>> >> this object."  In the case of ZFS, a write(2) may return before any
>> >> blocks are actually allocated.  And thanks to compression, gang
>> ...
>> >My gut tells me that this is gona cause problems, is it ONLY
>> >the st_blocksize data that is incorrect then not such a big
>> >problem, or are we returning other meta data that is wrong?
>>
>> Note that it's st_blocks, not st_blocksize.
> Yes, I just ignore that digretion, as well as the digretion into fts_read
> being anything special about this, as it just ends up calling stat(2) in
> the end anyway.
>
>>
>> I did an experiment, writing a (roughly) 113MB file (some data I had
>> lying around), close()ing it and then stat()ing it in a loop.  This is
>> FreeBSD 10.3 with ZFS and lz4 compression.  Over the 26ms following the
>> close(), st_blocks gradually rose from 24169 to 51231.  It then stayed
>> stable until 4.968s after the close, when st_blocks again started
>> increasing until it stabilized after a total of 5.031s at 87483.  Based
>> on this, st_blocks reflects the actual number of blocks physically
>> written to disk.  None of the other fields in the struct stat vary.
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Thank you for doing the proper regression test, that satisfies me that
> we dont have a lattent bug sitting here and infact what we have is
> exposure of the kernel caching, which I might be too thrilled about,
> is just how its gona have to be.
>
>>
>> The 5s delay is presumably the TXG delay (since this system is basically
>> unloaded).  I'm not sure why it writes roughly ? the data immediately
>> and the rest as part of the next TXG write.
>>
>> >My expectactions of executing a stat(2) call on a file would
>> >be that the data returned is valid and stable.  I think almost
>> >any program would expect that.
>>
>> I think a case could be made that st_blocks is a valid representation
>> of "the number of blocks allocated for this object" - with the number
>> increasing as the data is physically written to disk.  As for it being
>> stable, consider a (hypothetical) filesystem that can transparently
>> migrate data between different storage media, with different compression
>> algorithms etc (ZFS will be able to do this once the mythical block
>> rewrite code is written).
>
> I could counter argue that st_blocks is:
> st_blocks   The actual number of blocks allocated for the file in
>                  512-byte units.
>
> Nothing in that says anything about "on disk".  So while this thing
> is sitting in memory on the TXG queue we should return the number of
> 512 byte blocks used by the memory holding the data.
> I think that would be the more correct thing than exposing the
> fact this thing is setting in a write back cache to userland.
>
> --
> Rod Grimes                                                 rgrimes_at_freebsd.org

"Transparent" does not mean "undetectable".  For example, ZFS's
transparent compression will affect the st_blocks reported for a file.
I think the only sane use of st_blocks is to treat it as advisory.
I've seen a lot of bugs caused by programmers assuming a certain
mathematical relationship between the numbers presented by "df", "zfs
list", etc.

BTW, I've confirmed that ZFS on Illumos has the same behavior.  A
file's st_blocks doesn't stabilize until a few seconds after you write
it.

And it turns out that the fsync(1) doesn't work.  This suggests that
ZFS doesn't consider blocks in the ZIL when it reports st_blocks.

-Alan
Received on Fri Mar 03 2017 - 15:39:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC