On Fri, Mar 3, 2017 at 7:11 AM, Rodney W. Grimes <freebsd-rwg_at_pdx.rh.cn85.dnsmgr.net> wrote: > -- Start of PGP signed section. > [ Charset ISO-8859-1 unsupported, converting... ] >> On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes" <freebsd-rwg_at_pdx.rh.CN85.dnsmgr.net> wrote: >> >> du(1) is using fts_read(3), which is based on the stat(2) information. >> >> The OpenGroup defines st_blocksize as "Number of blocks allocated for >> >> this object." In the case of ZFS, a write(2) may return before any >> >> blocks are actually allocated. And thanks to compression, gang >> ... >> >My gut tells me that this is gona cause problems, is it ONLY >> >the st_blocksize data that is incorrect then not such a big >> >problem, or are we returning other meta data that is wrong? >> >> Note that it's st_blocks, not st_blocksize. > Yes, I just ignore that digretion, as well as the digretion into fts_read > being anything special about this, as it just ends up calling stat(2) in > the end anyway. > >> >> I did an experiment, writing a (roughly) 113MB file (some data I had >> lying around), close()ing it and then stat()ing it in a loop. This is >> FreeBSD 10.3 with ZFS and lz4 compression. Over the 26ms following the >> close(), st_blocks gradually rose from 24169 to 51231. It then stayed >> stable until 4.968s after the close, when st_blocks again started >> increasing until it stabilized after a total of 5.031s at 87483. Based >> on this, st_blocks reflects the actual number of blocks physically >> written to disk. None of the other fields in the struct stat vary. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Thank you for doing the proper regression test, that satisfies me that > we dont have a lattent bug sitting here and infact what we have is > exposure of the kernel caching, which I might be too thrilled about, > is just how its gona have to be. > >> >> The 5s delay is presumably the TXG delay (since this system is basically >> unloaded). I'm not sure why it writes roughly ? the data immediately >> and the rest as part of the next TXG write. >> >> >My expectactions of executing a stat(2) call on a file would >> >be that the data returned is valid and stable. I think almost >> >any program would expect that. >> >> I think a case could be made that st_blocks is a valid representation >> of "the number of blocks allocated for this object" - with the number >> increasing as the data is physically written to disk. As for it being >> stable, consider a (hypothetical) filesystem that can transparently >> migrate data between different storage media, with different compression >> algorithms etc (ZFS will be able to do this once the mythical block >> rewrite code is written). > > I could counter argue that st_blocks is: > st_blocks The actual number of blocks allocated for the file in > 512-byte units. > > Nothing in that says anything about "on disk". So while this thing > is sitting in memory on the TXG queue we should return the number of > 512 byte blocks used by the memory holding the data. > I think that would be the more correct thing than exposing the > fact this thing is setting in a write back cache to userland. > > -- > Rod Grimes rgrimes_at_freebsd.org "Transparent" does not mean "undetectable". For example, ZFS's transparent compression will affect the st_blocks reported for a file. I think the only sane use of st_blocks is to treat it as advisory. I've seen a lot of bugs caused by programmers assuming a certain mathematical relationship between the numbers presented by "df", "zfs list", etc. BTW, I've confirmed that ZFS on Illumos has the same behavior. A file's st_blocks doesn't stabilize until a few seconds after you write it. And it turns out that the fsync(1) doesn't work. This suggests that ZFS doesn't consider blocks in the ZIL when it reports st_blocks. -AlanReceived on Fri Mar 03 2017 - 15:39:17 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC