On Mon, 17 Sep 2007 13:21:30 +0400 Andrey Chernov <ache_at_nagual.pp.ru> wrote: > On Mon, Sep 17, 2007 at 10:29:21AM +0200, Petr Hroudn?? wrote: > > 2007/9/16, Andrey Chernov <ache_at_nagual.pp.ru>: > > > The problem is: currently our single byte ctype functions are broken for > > > wide characters locales in the argument range >= 0x80 - they may return > > > false positives. > > > > > > For example, for UTF-8 locale we currently have: > > > iswspace(0xA0)==1 and isspace(0xA0)==1 > > > (because iswspace() and isspace() are the same code) > > > but must have > > > isspace(0xA0)==0 > > > > This is exactly what happens on other OSes and I agree this is the > > right behaviour > > for UTF-8. However, we must ensure, that: > > > > for C locale: isspace(0xA0)==0 > > for ISO8859-* locales: isspace(0xA0)==1 > > for UTF-8 locales: isspace(0xA0)==0 > > The patch test for wide char locale presence first (__mb_cur_max > 1), so > does not affect single byte locales like ISO8859-* > Checking for __mb_cur_max is not enough for certain locales. For example, SJIS has following range for JIS X0201 (a.k.a. HALFWIDTH KANA). /* * JIS X201 */ PUNCT 0xa1-0xa5 SPACE 0xa0 BLANK 0xa0 SPECIAL 0xa1-0xdf PHONOGRAM 0xa6-0xdf SWIDTH1 0xa0-0xdf -- -|-__ YAMAMOTO, Taku | __ < <taku_at_tackymt.homeip.net> - A chicken is an egg's way of producing more eggs. -Received on Mon Sep 17 2007 - 15:01:09 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:17 UTC