Re: Ctype patch for review

From: Andrey Chernov <ache_at_nagual.pp.ru>
Date: Wed, 19 Sep 2007 16:10:24 +0400
On Wed, Sep 19, 2007 at 09:18:30AM +0400, Andrey Chernov wrote:
> I change my mind again, now I use new __mb_bit8_override flag specific to 
> UTF-8 encoding (other bit8 overriding encodings could use it too). New 
> patch attached.

Improved vesrsion. Intoduce general __mb_sch_limit parameter instead for 
all locales specifying upper limit of single char range. It allows also 
fix the bug when ctype(3) functions called with arg > 0xFF for wide 
character locales and simplifies all checks. New patch is attached. Here 
is updated rationale again:

-------------------------------------------------------------------------
The problem is: currently our single byte ctype(3) functions are broken 
for wide characters locales in the argument range >= 0x80 - they may 
return false positives.

Example 1: for UTF-8 locale we currently have:
iswspace(0xA0)==1 and isspace(0xA0)==1
(because iswspace() and isspace() are the same code)
but must have
iswspace(0xA0)==1 and isspace(0xA0)==0
(because there is no such character and all others in the range
0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte
range because our internal wchar_t representation for UTF-8 is UCS-4).

Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may 
return false positives (must be 0).
(because iswalpha() and isalpha() are the same code)

Attached patch address this issue and also fix iswascii()
(currently iswascii() is broken for arguments > 0xFF).
This patch is 100% binary compatible with old binaries.

-- 
http://ache.pp.ru/

Received on Wed Sep 19 2007 - 10:10:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:17 UTC