Re: using bzip2 to compress man-pages

From: Charles Swiger <cswiger_at_mac.com>
Date: Thu, 22 Sep 2005 16:12:24 -0400
On Sep 22, 2005, at 2:21 PM, Ulrich Spoerlein wrote:
> On Thu, 22.09.2005 at 00:46:11 -0400, Mikhail T. wrote:
>> Hello!
>>
>> How can I attract an interested comitter to my:
>>
>>     http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/79607
>
> Changing the default format for manpages has serious bikeshed  
> potential.
> While adding support for reading/writing bz2 compressed manpages is
> certainly useful, I doubt the benefit of switching to bzip2 compressed
> manpages.
>
> There are several points to consider:
>
> 1. I dont want to wait for my manpages to display, they have to be on
> screen instantanously.

Running "catman" first and saving the "neqn $* | tbl | nroff -man |  
col" output would probably make a lot more difference compared to the  
cost of running gzcat versus bzcat.  :-)

> 2. "Desktops" and "Servers" have several GBs of space. No need to
> further squeeze the manpages (heck, I'd even consider not compressing
> them at all, but slight compression should even be faster than no
> compression at all)

This is a more interesting point.  Unless compressing the manpage  
nroff sources or a preformatted cat page actually compresses well  
enough that the file uses fewer disk sectors, doing compression isn't  
useful.

Likewise, unless compressing the manpage with bzip2 rather than gzip  
saves enough data to gain a sector, the switchover would not be  
useful.  From what it looks like, the average manpage is about 2000  
bytes uncompressed, or about 1K compressed with gzip.

Consider the results of:

du -a /usr/share/man/man* | sort -n >! /tmp/manpage_sizes
for f in `fgrep .gz /tmp/manpage_sizes | awk '{print $2}'`
do
wc -c $f && gzcat $f | bzip2 --best | wc -c
done

My guess is that roughly 95% of the manpages aren't going to save a  
disk sector by switching.

-- 
-Chuck
Received on Thu Sep 22 2005 - 18:12:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:44 UTC