RE: [HEADS-UP] BSD sort is the default sort in -CURRENT

From: Oleg Moskalenko <oleg.moskalenko_at_citrix.com>
Date: Fri, 29 Jun 2012 13:50:30 -0700
Now I am going to provide the second informational 
part about the new BSD sort: the differences between
the new BSD sort and the GNU sort programs.

Below: 
NBSD - new BSD sort;
OGNU - old BSD/GNU sort, version 5.3.0;
NGNU - new GNU sort, version 8.15.

There are several areas:

1) -k option (sort fields/keys), with single byte locales. 

The -k option functionality is described in POSIX standard 
with deceitfully simple wording. Unfortunately, the real meaning 
is rather complex, and sometimes it defies the common sense. 
For example, a sort key can spread across boundaries with 
subsequent sort keys. A common-sense-based implementation 
would fail the POSIX standard. 

The OGNU implementation fails to follow the standard. In our tests,
there are thousands various use cases when it does not work properly.
The OGNU exact algorithm and the failure pattern is a mystery for us.
The NGNU implements the POSIX algorithm properly, and the NBSD is
compatible with NGNU when using -k in single-byte locales.

2) -k option (sort fields/keys), with multi-byte locales.

Here the situation is reverse. During the -k calculation, NGNU basically 
ignores the symbol sizes (but it does not ignore the symbol sizes when 
it compares the fields). The result is that the sort keys are not 
calculated correctly by the NGNU. OGNU, on the other hand, seems to use 
the correct symbol sizes, but it is still using incorrect algorithm. So, 
the overall picture is:

   - OGNU uses wrong algorithm, but it uses correct symbol sizes;
   - NGNU uses correct algorithm, but it uses wrong symbol sizes;
   - NBSD uses NGNU-compatible algorithm with correct symbol sizes.

So, for big majority of users who are using only single-byte 
locales, the NBSD sort key calculation is compatible with NGNU. 
For the multi-byte locales (like zh_HK.Big5HKSCS), we believe 
that only NBSD provides the correct results with -k option.

3) NGNU does not allow options -d and/or -i together with numeric sorting.
OGNU and NBSD allows that.

4) NBSD adds NGNU-compatible options, which are absent in OGNU:

  -C (silent version of -c)
  -h ("human" numeric sort)
  -R (random sort)
  -V (version sort)
  --batch-size
  --compress-program
  --random-source
  --debug
  --files0-from
  
5) NBSD adds several of its own new proprietary options:

  --mergesort
  --qsort
  --heapsort
  --radixsort
  --nthreads=... (multi-threaded build only)

6) NBSD does not support option --parallel of NGNU. 
It has roughly the same meaning as --nthreads in NBSD. 
This difference will be fixed soon.

Thanks
Oleg


> -----Original Message-----
> From: owner-freebsd-current_at_freebsd.org [mailto:owner-freebsd-
> current_at_freebsd.org] On Behalf Of Oleg Moskalenko
> Sent: Wednesday, June 27, 2012 6:45 PM
> To: FreeBSD Current
> Subject: RE: [HEADS-UP] BSD sort is the default sort in -CURRENT
> 
> Hi
> 
> As promised, I am supplying an example of comparison between several
> sort programs.
> 
> The test file is a randomly generated 1,000,000 lines, each line
> contain a single floating point number.
> 
> We are going to sort it three ways - as text, as -n numeric sort, and
> as -g numeric sort, with 4 programs:
> 1) Old BSD/GNU sort 5.3.0
> 2) New GNU sort 8.15
> 3) New BSD sort, single threaded
> 4) New BSD sort, multi-threaded
> 
> The system is a 3-CPUs system, 1.5Gb of RAM, FreeBSD version 8.2. All
> times are in seconds. Locale C.
> 
> ==============================================
> 
>      TEXT SORT
> 
>                      sys     user      real
> Old BSD/GNU sort:    0.0     1.692     2.008
> New GNU sort:        0.0     2.279     1.605
> New BSD sort, st:    0.0     1.964     2.300
> New BSD sort, mt:    0.0     2.385     1.897
> 
> ==============================================
> 
>      NUMERIC SORT -n
> 
>                      sys     user      real
> Old BSD/GNU sort:    0.0     4.357     4.674
> New GNU sort:        0.0     8.839     5.134
> New BSD sort, st:    0.0     5.308     5.592
> New BSD sort, mt:    0.0     4.581     2.489
> 
> ==============================================
> 
>      NUMERIC SORT -g
> 
>                      sys     user      real
> Old BSD/GNU sort:    0.0     45.378    45.630
> New GNU sort:       ~450    ~121      ~300
> New BSD sort, st:    0.33    4.334     5.992
> New BSD sort, mt:    11.140  4.624     8.983
> 
> ===============================================
> 
> Thanks
> Oleg
> 
> 
> 
> 
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-
> unsubscribe_at_freebsd.org"
Received on Fri Jun 29 2012 - 18:50:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:28 UTC