logo       

ASCII and JIS X 0201 Roman - the backslash problem: msg#00070

internationalization.linux

Subject: ASCII and JIS X 0201 Roman - the backslash problem

Hi all,

Tomohiro Kubota, in
http://www.debian.or.jp/~kubota/unicode-symbols-yen.html, explains
the YEN SIGN versus REVERSE SOLIDUS problem. He writes:

"Solution is very simple. Just regard YEN SIGN and REVERSE SOLIDUS
as a different glyphs of the same character. Then, distinction
between ASCII and JIS X 0201 Roman can be neglected."

I don't think it is a good solution. It will never allow Japanese users
to use the same fonts for ASCII as other users elsewhere.

The way to make it possible for Japanese users to work in a UTF-8 locale
consists of

1) Admit that YEN SIGN and REVERSE SOLIDUS are different things.

2) Never use backslash as a directory separator.

3) For programs that interpret backslash as some kind of escape character
and use Unicode internally but should work with text in Shift_JIS
encoding, consider the multibyte character 0x5C as being the escape
trigger, not [only] the Unicode character U+005C. This is already done
in bash and gettext. For example, in GNU gettext, we have the code

static bool
mb_iseq (mbc, sc)
const mbchar_t mbc;
char sc;
{
/* Note: It is wrong to compare only mbc->uc, because when the encoding is
SHIFT_JIS, mbc->buf[0] == '\\' corresponds to mbc->uc == 0x00A5, but we
want to treat it as an escape character, although it looks like a Yen
sign. */
#if HAVE_ICONV && 0
if (mbc->uc_valid)
return (mbc->uc == sc); /* wrong! */
else
#endif
return (mbc->bytes == 1 && mbc->buf[0] == sc);
}

4) When people convert files from Shift_JIS to Unicode, they need to
disambiguate the two uses of the character that Tomohiro mentions:
"When a Japanese person is a writer, it means YEN SIGN in most cases.
When a non-Japanese person is a writer, it always means REVERSE SOLIDUS."
These "most cases" need to be distinguished - in a financial text the
use is likely different than in a shell script. It can not be done
by the iconv program.

Bruno


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise