Re: Unicode-based FreeBSD

From: Tim Kientzle <kientzle_at_freebsd.org> Date: Mon, 25 Aug 2008 21:37:41 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC

> Going to UTF-8 might fix some of the character issues
> but we would be in the same shoes when it comes to characters
> which are in -16 and -32 but not in -8.

You need to read the Unicode/ISO10646 standards again;
you do not understand them.

There are no characters in UTF-32 that are not in UTF-8.

UTF-32, UTF-16, and UTF-8 all use exactly the same characters.

UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to 4 
bytes per character.

UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 to 
4 bytes per character.

UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 
bytes per character.

Practically speaking, UTF-8 is a bit more convenient for file
storage and transmission (including terminal support), UTF-16
or UTF-32 can be slightly more convenient for internal
string manipulation.  But all three encodings use exactly
the same characters.

Tim Kientzle