I saw that this issue was on the todo list for 6.1R so I decided to take a look at it. http://www.freebsd.org/cgi/query-pr.cgi?pr=93629 As it says in the report you can recreate the abort by doing the following setenv LANG uk_UA.KOI8-U setenv LC_CTYPE ja_JP.UTF-8 /usr/bin/sort This is quite a weird problem and the it lies in that sort tries to handle the LC_TIME values in inittables_mb() thinking that they are in UTF format. The LC_TIME values for uk_UA.KOI8-U does not use UTF encoding but it uses NONE as encoding. Normally this wouldn't be a problem since the multibyte routines handle normal ascii values <= 7f just fine and that's why sort works fine when setting LANG to C for example (since Jan-Dec has no ascii > 7f). The thing about uk_UA.KOI8-U (and some others) is that it uses ascii values > 7f to represent the ukrainian alphabet. For example Jan in uk_UA.KOI8-U's LC_TIME is d3 a6 de 00. When you parse that string as UTF, d3 says that it is a multibyte of length 2 and that one works fine (does not trigger the assertion) but then d6 also says that it is a multibyte of length 2 and that makes mbrtowc() return -2 (see man mbrtowc) and that's what makes the assertion go off and abort. I don't know what I think is the best way to solve this but I think that something should be done to make sort not abort and core dump. One solution is of course to make sort check that LC_CTYPE and LC_TIME is the same (or C) but maybe some people want's to have it that way (although I don't see why). Do you have any ideas on how this can be solved in a nice way or do you think that the fix "set LC_CTYPE and LC_TIME to same value" is enough? /Tobias SvehagenReceived on Sun Apr 02 2006 - 10:32:06 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:54 UTC