> Going to UTF-8 might fix some of the character issues > but we would be in the same shoes when it comes to characters > which are in -16 and -32 but not in -8. You need to read the Unicode/ISO10646 standards again; you do not understand them. There are no characters in UTF-32 that are not in UTF-8. UTF-32, UTF-16, and UTF-8 all use exactly the same characters. UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to 4 bytes per character. UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 to 4 bytes per character. UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 bytes per character. Practically speaking, UTF-8 is a bit more convenient for file storage and transmission (including terminal support), UTF-16 or UTF-32 can be slightly more convenient for internal string manipulation. But all three encodings use exactly the same characters. Tim KientzleReceived on Tue Aug 26 2008 - 03:03:15 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC