Re: cvs commit: src/usr.bin/tar Makefile bsdtar.1 bsdtar.c bsdtar.h bsdtar_platform.h matching.c read.c util.c write.c

From: Tim Kientzle <tim_at_kientzle.com>
Date: Tue, 06 Apr 2004 11:14:48 -0700
Ruslan Ermilov wrote:
> On Mon, Apr 05, 2004 at 02:32:18PM -0700, Tim Kientzle wrote:
> 
>>kientzle    2004/04/05 14:32:18 PDT
>>
>>  FreeBSD src repository
>>
>>  Added files:
>>    usr.bin/tar          Makefile bsdtar.1 bsdtar.c bsdtar.h 
>>                         bsdtar_platform.h matching.c read.c 
>>                         util.c write.c 
>>  Log:
>>  Initial commit for bsdtar.
>>  
> 
> Awesome!  Are there some benchmarking results available?

I haven't focused very closely on performance yet, to be honest, though
the internal architecture is pretty clean (minimal data copying;
reuse of internal buffers to avoid heap thrashing).

I did some quick tests early on and the performance (on dearchiving)
was roughly comparable to gnutar.  (Within about 5-10%.)  That will
improve some as I continue to work on it.  However, in general,
I expect it to be a little bit slower because the compression
isn't handled in a separate process (thus there's less overlapping
of I/O and computation).

But, there are a lot of nice new features:

  * Fully automatic format/compression detection.
     In particular, the following commands all work:

       bsdtar -xf file.tgz
       bsdtar -xf file.tbz
       bsdtar -xf file.cpio

    or even

       fetch -o - http://...../file.tgz | bsdtar -xf -

    GNU tar can't do any of these; 'star' fails the last
    one.  To be fair, "Heirloom tar" does support all of these.

  * Ability to interpolate an archive.  The following
    combines the contents of "foo1.tgz" and "foo2.cpio"
    into a single archive called "out.tbz":

      bsdtar -cjf out.tbz _at_foo1.tgz _at_foo2.cpio

    Yes, you can mix interpolations and regular files on
    the command line.  You can even interpolate from stdin:

      bsdtar -cjf - -F pax _at_-

    converts an archive read on stdin into a pax-format,
    bzip2-compressed archive on stdout.  Once I get mtree
    read support, you'll be able to convert an mtree file
    into a shell script, for example:
        bsdtar -cf tree.sh -F shar _at_tree.mtree

  * Compliance with SUSv2.  SUSv2 (POSIX.1-1997 ?) was
    the last official spec for tar.  GNU tar does not
    comply with the file format specified there, nor does
    it correctly implement the command-line options specified
    there.  By default, bsdtar will create standard ustar
    archives unless it finds a file attribute that is not
    supported by ustar (such as a very long filename or ACL),
    in which case it will use SUSv3 (POSIX.1-2001) extensions
    to carry the additional data.  There are command-line options
    to force straight ustar format or permit SUSv3 ("pax")
    extensions even when not absolutely required.  (The default
    format won't use SUSv3 extensions just to store atime/ctime
    or sub-second timestamps; specifying "pax" format will.)

  * Support for SUSv3 extensions.  The "pax" format extensions
    eliminate essentially all of the historic limitations of
    tar in a way that is easily extensible and compatible with
    standard-compliant "pax" implementations on other platforms.
    (as well as some modern tar implementations, notably Joerg
    Schilling's "star")

  * More complete archiving.  With the "pax" format, bsdtar will
    archive ino/dev/nlink, sub-second resolution mtime/ctime/atime,
    ACLs, file flags, etc, etc.  Not all of this can currently be
    restored (ino/dev/nlink/ctime are currently ignored on extract),
    but it's all stored in the archive.

  * Broad format support.  bsdtar reads the usual bevy of tar formats,
    and some cpio archives (only the odc variant at the moment).
    It writes standard tar formats, cpio, and shar.  The
    underlying libarchive library is extensible and I have plans
    for reading mtree files, reading/writing more cpio
    formats, reading ZIP archives, etc.

  * Cleanly factored.  The archive format support is all in a separate
    library.  It should be fairly routine to build "cpio" or "pax"
    command-line interfaces to the same library or use the library for
    "pkg_install" or "pkg_create."  For comparison, right now "bsdtar"
    is ~2,000 lines of C, "libarchive" is closer to 10,000 lines of C.

There is some performance work to be done; I need to build
a uid/gid/uname/gname cache, for example.  Part of my recent rewrite
of the ACL support was to get to the point that there was one
place where all such lookups were handled, regardless of whether
it's a file owner or an ACL that needs the information.

There are still a few bugs to iron out and a couple of features that
are a bit incomplete, but it's getting better quickly.  My hope
is that a few adventurous souls will start using it and giving
me feedback so that I can grow it into the system tar
that FreeBSD deserves.

Tim
Received on Tue Apr 06 2004 - 09:15:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:50 UTC