Re: bsdtar vs gtar performance

From: Kris Kennaway <kris_at_obsecurity.org> Date: Sun, 24 Sep 2006 00:52:13 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC

[Moving to current_at_ where it's on-topic]

On Sat, Sep 23, 2006 at 10:26:38AM -0700, Tim Kientzle wrote:
> Kris and Ruslan were recently discussing the performance of bsdtar
> relative to gtar, which prompted me to do some measurements
> of my own.   I used /usr/ports as my test, because it stresses
> file and directory creation over extracting large files.
> 
> Here are some initial results, based on ten runs of each test on a
> quiescent system, comparing results with PHK's "ministat":
> 
> * Creating uncompressed archives:  bsdtar and gtar showed
>    no difference in total time.
> 
> * Extracting gzip-compressed archives:  bsdtar and gtar showed
>    no difference in total time.
> 
> * Extracting uncompressed archives:  gtar is about 13% faster
>    than bsdtar in my test.  Interestingly (to me), this was the same
>    with or without -m.  (I've long suspected dir timestamp restores
>    as a contributor; this shows otherwise.)

With 10 repetitions of an extraction of the ports tree to a
swap-backed md (newfs'ed in between tests, mounted async), I get a
much bigger difference in favour of gtar:

x gtar-data
+ bsdtar-data
+------------------------------------------------------------+
|x                                                        +  |
|x                                                        +  |
|xx                                                       +  |
|xx                                                       ++ |
|xx                                                       ++ |
|xx                                                      ++++|
|A|                                                       A| |
+------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10          34.9          35.2        34.985        35.008   0.095893459
+  11         48.95         49.68         49.21     49.249091    0.19216943
Difference at 95.0% confidence
        14.2411 +/- 0.141059
        40.6795% +/- 0.402932%
        (Student's t, pooled s = 0.154247)

I suspect you were measuring extraction on real disk hardware, in
which case you're mostly measuring overhead from the disk I/O, which
is going to make up most of the real time in both cases.

Kris