Re: Unicode-based FreeBSD

From: Alexander Churanov <alexanderchuranov_at_gmail.com> Date: Mon, 25 Aug 2008 17:21:36 +0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC

2008/8/25 Svavar Lúthersson <admin_at_stuff.is>

> I am not an expert in Unicode but I am Icelandic and need to manage
> filenames which have some "special characters" in the Latin alphabet. Like
> á, ð, é, í, ó, ú, þ, æ and ö. Even though these characters are defined in
> ISO-8859-1 and -15, they cannot be directly typed (by default) in the
> console in FreeBSD (also applies to Debian Linux).
>
> On to my point... My suggestion is to go as far as possible with the
> proposed solution. There should be UTF-32, UTF-16 and UTF-8 support and the
> first-mentioned should be the primary charset with the others as fallback. I
> think only enabling UTF-8 is not going far enough and therefore I do not
> support Churanov's ideas to obscure non-displayable characters with other
> symbols.
>
Svavar,

You have to type "special characters" that are high-bit characters of
ISO-8859-1 and -15. I have to type cyrillic characters that are high-bit
characters of koi8-r. But I am able to do this. Did you try "keymap" and
"scrnmap" settings of "rc.conf"? I am not sure, but your issue looks like
misconfiguration.

Then, about UTFs. All three forms encode THE SAME set of code points and
from user's perspective there is no great difference. However, UTF-8 is
interoperable with ASCII and this fact makes many old applications work
without modification. I've already posted information about my experience of
using vipw with UTF-8 on FreeBSD 6.2 having LANG=ru_RU.KOI8-R to the list.

The actual drawback of my solution is that a person will not be able to read
and type Icelandic and Russian text simultaneously in syscons console. And
that ideas of obscuring output are attempts to provide some way to
manipulate files with, say, russian names on a PC tuned for Icelandic text.

Please note, that I DO NOT propagandize syscons character mode as a device
for working correctly with multilingual texts. For some scripts, for
example, Devanagari, syscons will NEVER work uless it is extended to
something like X, freetype, freebidi and many other tools working together.
Please, note that you can start working in true multilingual environment
right now, using, for example, X+KDE (kate and konsole) and switching them
to UTF-8. This will work.

What I am trying to discuss is just making syscons working correctly if the
whole system is switched to UTF-8. This will not affect X and KDE, but
standard syscons FreeBSD console will fail to work correctly. Mainly the
ideas are:

1) Make switching everything to UTF-8 possible.

2) Either map non-ASCII characters to 128-chars subset of full unicode range

    Or encode them to sequences of ASCII chars.

    Or mix these approaches.

To my mind this should result in the following abilities:

1) To work in graphical environment without restrictions. (this is what you
have right now)

2) To read and type some filenames (that contains only characters that are
mappable to 8-bit font) in a natural way. (this is also possible now, but
with 8-bit LANG, not UTF-8)

3) To read and type filenames that contain characters that do not fit in
current 8-bit screenmap, possibly in an unnatural way.

The later would help if you are in Iceland and see a Chinese filename. I
want engineers that do techical support of systems to be able to delete or
rename such files even in single-user mode. I think that typing something
like "#1234;#4321;" instead of actual hieroglyph is affordable price.

I'm just trying to be realistic and provide doable solution. I leave plans
of rewriting every bit of software to others. And I even think that latter
is not required, since syscons console is probably not heavily used now.

Alexander Churanov