logo       

Re: Switching to UTF-8: msg#00028

internationalization.linux

Subject: Re: Switching to UTF-8




On Thu, 2 May 2002, Tomohiro KUBOTA wrote:

> At Wed, 01 May 2002 20:02:57 +0100,
> Markus Kuhn wrote:
>
> > I have for some time now been using UTF-8 more frequently than
> > ISO 8859-1. The three critical milestones that still keep me from
> > moving entirely to UTF-8 are

> How about bash? Do you know any improvement?

> Please note that tcsh have already supported east Asian EUC-like
> multibyte encodings. I don't know it also supports UTF-8.

It doesn't seem to support UTF-8 locale as of tcsh 6.10.0
(2000-11-19). I can't find anything about UTF-8 at http://www.tcsh.org.
The newest release is 6.11.0 The same is true of zsh.
(http://www.zsh.org)

> combining characters? bidi? Arab shaping? Indic scripts?
and Hangul :-)
> Mongol (which needs vertical direction)? How about wcwidth()?

Pango and ST should certainly help, here....

> * input methods
> Any way to input complex languages which cannot be supported
> by xkb mechanism (i.e., CJK) ? XIM? IIIMP? (How about Gnome2?)

You mean IIIMF, didn't you? If there's any actual implementation,
I'd love to try it out. We need to have Windows 2k/XP or MacOS 9/X
style keyboard/IM switching mechanism/UI so that keyboard/IM modules
targeted at/customized for each language can coexist and be brought up as
necessary. It appears that IIIMF seems to be the only way unless somebody
writes a gigantic one-fits-all XIM server for UTF-8 locale(s).

How about just running your favorite XIM under ja_JP.EUC-JP while
all other applications are launched under ja_JP.UTF-8? As you know well,
it just works fine although the character repertoire you can enter
is limited to that of EUC-JP. Of course, this is not full-blown UTF-8
support, but at least it should give you the same degree of Japanese
input support under ja_JP.UTF-8 as under ja_JP.EUC-JP. Well, then
you would say what the point of moving to UTF-8 is. You can at least
display more characters under UTF-8 than under EUC-JP, can't you? :-)

In Korean case, as I wrote a couple of days ago, I had to
modify Ami (a popular Korean XIM) to make it run under ko_KR.UTF-8
because otherwise even though my applications are running under and
fully aware of UTF-8 (e.g. vim under UTF-8 xterm), I couldn't enter
over 8,000 Hangul syllables not in EUC-KR but in UTF-8. Moreover,
under ko_KR.UTF-8, Xterm-16x and Vim 6.1 with a single line patch works
almost flawlessly with U+1100 Hangul Jamos. Markus, can you update your
UTF-8 FAQ on this issue? Xterm has been supporting Thai script and that
certainly brought in almost automagically Middle Korean support as
a by-product.

BTW, Xkb may work for Korean Hangul, too and we don't need
XIM if we use 'three-set keyboard' instead of 'two-set keyboard' and can
live without Hanjas. I have to know more about Xkb to be certain, though.

> Or, any software-specific input methods (like Emacs or Yudit)?

Yudit supports Indic, Thai, Arabic pretty well as far as I know.
And, judging from what Gaspar wrote to me, Middle Korean support with
U+1100 jamo is not so far away. Most of what's necessary is firmly in
place because Gaspar has written a very generic complex script support
routines which hopefully can be used for Middle Korean without much
effort.

Jungshik Shin



<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise