Re: effect of strip(1) on du(1)

From: Alan Somers <asomers_at_freebsd.org> Date: Fri, 3 Mar 2017 10:30:04 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC

On Fri, Mar 3, 2017 at 10:25 AM, Allan Jude <allanjude_at_freebsd.org> wrote:
> On March 3, 2017 9:11:30 AM EST, "Rodney W. Grimes" <freebsd-rwg_at_pdx.rh.CN85.dnsmgr.net> wrote:
>>-- Start of PGP signed section.
>>[ Charset ISO-8859-1 unsupported, converting... ]
>>> On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes"
>><freebsd-rwg_at_pdx.rh.CN85.dnsmgr.net> wrote:
>>> >> du(1) is using fts_read(3), which is based on the stat(2)
>>information.
>>> >> The OpenGroup defines st_blocksize as "Number of blocks allocated
>>for
>>> >> this object."  In the case of ZFS, a write(2) may return before
>>any
>>> >> blocks are actually allocated.  And thanks to compression, gang
>>> ...
>>> >My gut tells me that this is gona cause problems, is it ONLY
>>> >the st_blocksize data that is incorrect then not such a big
>>> >problem, or are we returning other meta data that is wrong?
>>>
>>> Note that it's st_blocks, not st_blocksize.
>>Yes, I just ignore that digretion, as well as the digretion into
>>fts_read
>>being anything special about this, as it just ends up calling stat(2)
>>in
>>the end anyway.
>>
>>>
>>> I did an experiment, writing a (roughly) 113MB file (some data I had
>>> lying around), close()ing it and then stat()ing it in a loop.  This
>>is
>>> FreeBSD 10.3 with ZFS and lz4 compression.  Over the 26ms following
>>the
>>> close(), st_blocks gradually rose from 24169 to 51231.  It then
>>stayed
>>> stable until 4.968s after the close, when st_blocks again started
>>> increasing until it stabilized after a total of 5.031s at 87483.
>>Based
>>> on this, st_blocks reflects the actual number of blocks physically
>>> written to disk.  None of the other fields in the struct stat vary.
>>                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>Thank you for doing the proper regression test, that satisfies me that
>>we dont have a lattent bug sitting here and infact what we have is
>>exposure of the kernel caching, which I might be too thrilled about,
>>is just how its gona have to be.
>>
>>>
>>> The 5s delay is presumably the TXG delay (since this system is
>>basically
>>> unloaded).  I'm not sure why it writes roughly ? the data immediately
>>> and the rest as part of the next TXG write.
>>>
>>> >My expectactions of executing a stat(2) call on a file would
>>> >be that the data returned is valid and stable.  I think almost
>>> >any program would expect that.
>>>
>>> I think a case could be made that st_blocks is a valid representation
>>> of "the number of blocks allocated for this object" - with the number
>>> increasing as the data is physically written to disk.  As for it
>>being
>>> stable, consider a (hypothetical) filesystem that can transparently
>>> migrate data between different storage media, with different
>>compression
>>> algorithms etc (ZFS will be able to do this once the mythical block
>>> rewrite code is written).
>>
>>I could counter argue that st_blocks is:
>>st_blocks   The actual number of blocks allocated for the file in
>>                 512-byte units.
>>
>>Nothing in that says anything about "on disk".  So while this thing
>>is sitting in memory on the TXG queue we should return the number of
>>512 byte blocks used by the memory holding the data.
>>I think that would be the more correct thing than exposing the
>>fact this thing is setting in a write back cache to userland.
>
> Can we compare the results of du with du -A?
>
> Du will show compression savings, and -A wont
>
> ZFS compresses between the write cache and the disk, so the final size may not be know for 5+ seconds
> --
> Allan Jude

"du -A" does what you would expect.  It instantly reports the apparent
size of the file.  For incompressible files, this is actually less
than what "du" reports, because it doesn't take into account the znode
and indirect blocks.

-Alan