On Sat, 2014-04-05 at 05:35 +0400, Andrey Chernov wrote: > On 04.04.2014 16:46, Gleb Smirnoff wrote: > > On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote: > > A> On 02.04.2014 21:15, Gleb Smirnoff wrote: > > A> > S> + :lang=en_US.UTF-8:\ > > A> > S> + :charset=UTF-8: > > A> > > > A> > And I'd like to do same change for the 'russian' login class > > A> > in /etc/login.conf. > > A> > > A> Please everybody remember that we don't have UTF-8 collation > > A> implemented, just fallback to bytecode comparison. > > > > Any objections on checking in FreeBSD-compatible[1] UTF-8 collation > > implementation from Alex Tutubalin? > > > > http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html > > > > Even his "version 2" have my objections. I already reply Alex about this > in 2008. In short: > 1) It is error there: almost all single chars above ASCII should be > "chains", i.t. two bytes minimum, since there almost no intersections > with ISO8859-1 as UTF-8 subset. > 2) The table itself is very incomplete, f.e. not covering either whole > KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its > restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting > regression. Russian UTF-8 collation should be able to sort all major > Russian charsets mentioned, i.e. we need combined table. > 3) "charmap map.ISO8859-1" declaration is missing (needed mainly for > using pure ASCII chars mnemonic names). > > Even in case above mentioned errors will be removed and the code will be > committed afterwards, we should understand that this way (implementing > multibyte collation via single byte one) even while being possible is a > big hack and slowing sorting down up to 10 times. > > Proper "Unicode collation algorithm" is already implemented by ICU and > other projects. See > http://unicode.org/reports/tr10/ > It will be better if someone adopt it instead of hacks. > If you have a different patch, I'd appreciate seeing it. Sean
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:48 UTC