Re: NEW TAR

From: Danny Braniss <danny_at_cs.huji.ac.il> Date: Thu, 22 Jul 2004 10:34:34 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:02 UTC

> 
> --lrZ03NoBR/3+SXJZ
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Wed, Jul 21, 2004 at 05:14:27PM +0200, Daniel Lang wrote:
> > Hi,
> >=20
> > Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100:
> > [..]
> > > You're correct, in that filesystem semantics don't require an archiver=
> =20
> > > to recreate holes. There are storage efficiency gains to be made in=20
> > > identifying holes, that's true - particularly in the case of absolutely=
> =20
> > > whopping but extremely sparse files. In those cases, a simple=20
> > > userland-view-of-the-filesystem-semantics approach to ideentifying area=
> s=20
> > > that _might_ be holes (just for archive efficiency) can still be=20
> > > expensive and might involve the scanning of multiple gigabytes of=20
> > > "virtual" zeroes.
> > >=20
> > > Solaris offers an fcntl to identify holes (IIRC) for just this purpose.=
> =20
> > > If the underlying filesystem can't be made to support it, there's an=20
> > > efficiency loss but otherwise it's no great shakes.
> >=20
> > I don't get it.
> >=20
> > I assume, that for any consumer it is totally transparent if
> > possibly existing chunks of 0-bytes are actually blocks full of
> > zeroes or just non-allocated blocks, correct?
> >=20
> > Second, it is true, that there is a gain in terms of occupied disk
> > space, if chunks of zeroes are not allocated at all, correct?
> >=20
> > So, from my point of view it is totally irrelevant, if a sparse file
> > is archived and then extracted, if the areas, which contain zeroes
> > are exactly in the same manner consisting of unallocated blocks
> > or not.
> >=20
> > So, all I guess an archiver must do is:
> >=20
> >  - read the file=20
> >  - scan the file for consecutive blocks of zeroes
> >  - archive these blocks in an efficient way
> >  - on extraction, create a sparse file with the previously
> >    identified empty blocks, regardless if these blocks
> >    have been 'sparse' blocks in the original file or not.
> >=20
> > I do not see, why it is important if the original file was sparse
> > at all or maybe in different places.
> 
> Since sparse files over commit the disk, they should only be created
> deliberatly.  Otherwise you can easily get in trouble if you try to use
> reserved space later since it won't actually be reserved.  Consider the
> case of a file system image created with "dd if=3D/dev/zero ...; newfw
> =2E..".  If your archiver decides to be "smart" and restore a copy of that
> file sparce and then you use up the availble blocks on your disk you're
> going to be in a world of hurt.  I wouldn't be suprised it that resulted
> in a panic.

If the file has 'holes' and they are read as zero, then doesn't compressing
the tar file nicely reduce it?

dd if=/dev/zero of=junk count=100
tar czf junk.tar.gz junk
ls -ls junk*
 50 -rw-r--r--  1 danny  wheel  51200 Jul 22 10:28 junk
  2 -rw-r--r--  1 danny  wheel    170 Jul 22 10:33 junk.tar.gz

danny