Re: [HEADS-UP] BSD sort is the default sort in -CURRENT

From: Doug Barton <dougb_at_FreeBSD.org> Date: Wed, 27 Jun 2012 01:34:57 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:28 UTC

On 06/26/2012 11:48 PM, Oleg Moskalenko wrote:
> 
> 
>> -----Original Message----- From: Doug Barton
>> [mailto:dougb_at_FreeBSD.org] Sent: Tuesday, June 26, 2012 11:18 PM 
>> To: Gabor Kovesdan Cc: FreeBSD Current; Oleg Moskalenko Subject:
>> Re: [HEADS-UP] BSD sort is the default sort in -CURRENT
>> 
>> On 06/26/2012 11:04 PM, Gabor Kovesdan wrote:
>>> Hi Folks,
>>> 
>>> as I announced before, the default sort in -CURRENT has been
>>> changed to BSD sort.
>> 
>> Has this been performance tested vs. the old one? If so, where are
>> the results?
> 
> Of course it was performance-tested.

Great, can you post the results somewhere? I understand what you're
saying below that there are situations where worse performance may need
explanation, but it would be helpful if we had the data to look at.

> As this is a totally different
> program with different algorithms, it has totally different
> performance profile on different tests, comparing to the old sort. In
> the default compilation mode (single thread sort) the performance is
> on the same level as the old sort (sometimes faster, sometimes
> slower). The new sort is often significantly faster in numeric sort
> tests. In "experimental" multi-threading mode, the new sort is much
> faster than the old sort on multi-CPU systems.

This sounds encouraging. Is there a knob to enable the threaded build?

> The sort speed comparison is not actually fair because the old sort
> cuts some corners and has a number of bugs.

Understood, but the existing sort is what we're changing away from, so
that's what we have to test against. What we don't want is a situation
where we are switching to the new sort by default without understanding
what the tradeoffs are. (IOW, we don't want a repeat of the situation
with grep.)

> The concrete figures do not have much sense because you change the
> sort file and you get a totally different performance ratio.

I'm assuming that you'd run the performance tests on various different
input files, and report the different results.

>> Has this been thoroughly regression-tested against the old
>> version, option by option?
> 
> Of course we have the regression tests. We have an overnight test
> that runs through probably 17 millions various sort option
> combinations. 

This sounds great, but ...

> But we actually had to compare the new sort against a
> fresh GNU sort implementation (version 8.15), because the old BSD GNU
> sort is very buggy and testing against the old GNU sort has no
> sense.

... this not so much. The existing sort is what people have now, and
what they rely on, particularly for scripts. Comparing apples to oranges
doesn't help us understand how things are going to be different with the
new version.

Doug