On Wed, 21 Jul 2004, Daniel Lang wrote: DL>Hi, DL> DL>Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100: DL>[..] DL>> You're correct, in that filesystem semantics don't require an archiver DL>> to recreate holes. There are storage efficiency gains to be made in DL>> identifying holes, that's true - particularly in the case of absolutely DL>> whopping but extremely sparse files. In those cases, a simple DL>> userland-view-of-the-filesystem-semantics approach to ideentifying areas DL>> that _might_ be holes (just for archive efficiency) can still be DL>> expensive and might involve the scanning of multiple gigabytes of DL>> "virtual" zeroes. DL>> DL>> Solaris offers an fcntl to identify holes (IIRC) for just this purpose. DL>> If the underlying filesystem can't be made to support it, there's an DL>> efficiency loss but otherwise it's no great shakes. DL> DL>I don't get it. DL> DL>I assume, that for any consumer it is totally transparent if DL>possibly existing chunks of 0-bytes are actually blocks full of DL>zeroes or just non-allocated blocks, correct? DL> DL>Second, it is true, that there is a gain in terms of occupied disk DL>space, if chunks of zeroes are not allocated at all, correct? DL> DL>So, from my point of view it is totally irrelevant, if a sparse file DL>is archived and then extracted, if the areas, which contain zeroes DL>are exactly in the same manner consisting of unallocated blocks DL>or not. DL> DL>So, all I guess an archiver must do is: DL> DL> - read the file DL> - scan the file for consecutive blocks of zeroes DL> - archive these blocks in an efficient way DL> - on extraction, create a sparse file with the previously DL> identified empty blocks, regardless if these blocks DL> have been 'sparse' blocks in the original file or not. DL> DL>I do not see, why it is important if the original file was sparse DL>at all or maybe in different places. It just may be a good deal faster just to take existing hole information (if it exists) than to scan the file. Also there is a difference between holes and actual zeroes: it's like overcommitting memory. Yoy may have a 1TB file consisting of a large hole on a 10GB disk. Just as you write something to it you will get an error at some time even when writing into the middle of the file, just because the FS needs to allocate blocks. I could imagine an application knowing its access pattern to a large sparse file allocating zeroed blocks in advance while skipping blocks that it knows it'll not write, just to make sure the blocks are there when it will write later on. But that's a rather hypothetical application. hartiReceived on Wed Jul 21 2004 - 13:31:00 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:02 UTC