On Wed, Sep 19, 2007 at 09:18:30AM +0400, Andrey Chernov wrote: > I change my mind again, now I use new __mb_bit8_override flag specific to > UTF-8 encoding (other bit8 overriding encodings could use it too). New > patch attached. Improved vesrsion. Intoduce general __mb_sch_limit parameter instead for all locales specifying upper limit of single char range. It allows also fix the bug when ctype(3) functions called with arg > 0xFF for wide character locales and simplifies all checks. New patch is attached. Here is updated rationale again: ------------------------------------------------------------------------- The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) Attached patch address this issue and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This patch is 100% binary compatible with old binaries. -- http://ache.pp.ru/
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:17 UTC