Re: HEADS UP: bzip2(1) compression for manpages, Groff and Texinfo docs

From: Brad Knowles <brad.knowles_at_skynet.be>
Date: Fri, 2 May 2003 22:33:47 +0200
At 7:43 PM +0200 2003/05/02, Matthias Buelow wrote:

>  The two programs, however, only do the same thing if you consider
>  that they're both compressors.  bzip2 eats much more resources than
>  gzip, both space and time.  And the algorithm is rather overkill for
>  small files anyways.

	Granted, the space savings is not that much.  I took 
/usr/share/man/man1 from a 4.6.2-RELEASE box and made three copies of 
it under /tmp/man, uncompressed all the files, and then re-compressed 
them using `compress`, `gzip -9`, and `bzip2`.  Here's the results:

		% du * | sort -nr
		4646    compress
		3624    gzip
		3422    bzip2

	So, bzip2 is not that much of an improvement over gzip (~6%), but 
it is a fair improvement over compress (~35.7%).  This is just one 
section of the man pages, and does not include the cat pages, but I 
figure it's probably fairly representative.

	I haven't looked at the stuff under /usr/share/info or 
/usr/share/doc.  I'm not sure which of those files would be 
compressed and which ones wouldn't.  These three directories comprise 
~82MB of disk space, of which about 15MB is in /usr/share/man and 
about 64.6MB in /usr/share/doc.  At the moment, it doesn't appear 
that the files in /usr/share/doc are compressed at all, so there 
might be significant storage savings there.


	I built a tarball from the /usr/share/doc hierarchy, and tried 
the three different compression programs on it.  I know that 
compression on a tarball is going to be different from compression on 
individual files, but this should at least give us some idea.

	Anyway, here's the results:

		% ls -1s doc* | sort -nr
		 64368 doc.tar
		 22896 doc-compress.tar.Z
		 16080 doc-gzip.tar.gz
		 12032 doc-bzip2.tar.bz2

	So, bzip2 result in a file about 18.6% of the size of the 
original, gzip does about 24.9%, and compress is only 35.5%. 
Relatively speaking, bzip2 results in a file that is about 74.8% the 
size of the version produced by `gzip -9`.


	Seeing as /usr/share/doc and /usr/share/info is not currently 
compressed (in 4.6.2-RELEASE), any compression algorithm would be a 
significant improvement.

-- 
Brad Knowles, <brad.knowles_at_skynet.be>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
     -Benjamin Franklin, Historical Review of Pennsylvania.

GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
!w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
Received on Fri May 02 2003 - 11:34:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:06 UTC