Re: NEW TAR

From: Jan Grant <Jan.Grant_at_bristol.ac.uk>
Date: Wed, 21 Jul 2004 14:44:42 +0100 (BST)
On Tue, 20 Jul 2004, Stefan Bethke wrote:

> Am 20.07.2004 um 10:10 schrieb Peter Jeremy:
>
>> Actually, it's not possible to accurately determine the holes in a
>> file by reading it - you can't differentiate between a hole and a
>> allocated block of zeroes.  What you need is a (new) syscall that
>> invokes a new VOP_... and returns a bitmap of allocated blocks.  This
>> would be non-trival unfortunately.
>
> This one point that has been made a number of times in the past, and one I 
> don't understand:
>
> There are no sparse files as far as the userland is concerned; it's an 
> optimization that remains invisible, apart from space and/or performance 
> savings.
>
> For the extraction process, it should be sufficient to seek over any extended 
> range of zeros. When packaging files that might have holes in them, it'll 
> certainly be nice if there was a way to skip reading all those zeros in, but 
> that's just an optimization.
>
> The way you describe it (and others have before), it sounds like the holes 
> were an attribute of the file that should be preserved by tar (or any other 
> archiver); I believe it's not.  Preserving them in the way your post can be 
> read is problematic: what if the block/allocation/cluster/fragment size of 
> the extraction target differs from the source?  How far would you need to 
> acertain compatible allocation semantics between both filesystems?
>
> Since this has come up multiple times in the past, I feel I'm missing some 
> important detail, and I'd appreciate if someone would enlighten me.

You're correct, in that filesystem semantics don't require an archiver 
to recreate holes. There are storage efficiency gains to be made in 
identifying holes, that's true - particularly in the case of absolutely 
whopping but extremely sparse files. In those cases, a simple 
userland-view-of-the-filesystem-semantics approach to ideentifying areas 
that _might_ be holes (just for archive efficiency) can still be 
expensive and might involve the scanning of multiple gigabytes of 
"virtual" zeroes.

Solaris offers an fcntl to identify holes (IIRC) for just this purpose. 
If the underlying filesystem can't be made to support it, there's an 
efficiency loss but otherwise it's no great shakes.

-- 
jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/
Tel +44(0)117 9287088 Fax +44 (0)117 9287112 http://ioctl.org/jan/
Prolog in JavaScript: http://ioctl.org/logic/prolog-latest
Received on Wed Jul 21 2004 - 11:45:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:02 UTC