Re: strange tr behaviour

From: Jon Noack <noackjr_at_alumni.rice.edu>
Date: Fri, 26 Mar 2004 04:44:28 -0600
On 3/26/2004 4:09 AM, Michael Reifenberger wrote:
> On Fri, 26 Mar 2004, Jon Noack wrote:
>> Short version:
>> tr(1) was modified to be POSIX compliant for 5.x.  You are seeing
>> correct behavior.  See the solution below.
> 
> Thanks all for the hints.
> 
> Only that tr(1) states:
> ...
> COMPATIBILITY
>      System V has historically implemented character ranges using the syntax
>      ``[c-c]'' instead of the ``c-c'' used by historic BSD implementations and
>      standardized by POSIX.  System V shell scripts should work under this
>      implementation as long as the range is intended to map in another range,
>      i.e. the command ``tr [a-z] [A-Z]'' will work as it will map the ``[''
>      character in string1 to the ``['' character in string2.  However, if the
> ...
> 
> So I just expected the historic behaviour so that [a-z] map to [A-Z]
> as before :-(

 From tr(2):
c-c       For non-octal range endpoints represents the range of charac-
           ters between the range endpoints, inclusive, in ascending
           order, as defined by the collation sequence.

It's translating _ranges_.  To help understand this:
$ echo abcdef | tr a-z A-D
ABCDDD

The first range (a-z) is larger than the second (A-D), so it does a 
one-to-one mapping until it hits the end of the second range.  At that 
point it must just use the final character from the second range.

In your locale, the range a-z is smaller than the range A-Z.  Thus, the 
one-to-one mappings won't result in proper case conversion.

Perhaps tr(2) should be updated to say something about this.

Jon
Received on Fri Mar 26 2004 - 01:44:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:49 UTC