logo       

Re: ASCII and JIS X 0201 Roman - the backslash problem: msg#00075

internationalization.linux

Subject: Re: ASCII and JIS X 0201 Roman - the backslash problem

Tomohiro KUBOTA writes:

> > 3) For programs that interpret backslash as some kind of escape character
> > and use Unicode internally but should work with text in Shift_JIS
> > encoding, consider the multibyte character 0x5C as being the escape
> > trigger, not [only] the Unicode character U+005C. This is already done
> > in bash and gettext. For example, in GNU gettext, we have the code
>
> I think interpretation of
> U+00A5 as an additional escape character doesn't always work, because
> Unicode texts don't have information on their origin (converted from
> Shift_JIS or not).

These are particular kinds of text files, which are fed to such
programs that do backslash interpretation: shell scripts, awk scripts,
gettext PO files, etc. - yes if the Yen sign should appear there it
needs to be doubled.

> If U+00A5 would always be an escape character,
> it would be harmful for many softwares.

Why is it more harmful if U+00A5 is an escape character than if U+005C
is an escape character? In both cases you just double it to get the
original character.

> I am interested in how European people succeeded to migrate from ISO 646
> variants into ISO 8859. Yen Sign Problem is exactly a problem of ISO 646,
> because "0x5c = YEN SIGN" comes from JIS X 0201 Roman, which is Japanese
> variant of ISO 646.

For me, the migration occurred when I switched to using a different
computer with a different OS and a different character set. (From
ISO646-DE to CP437 at that time.) Few files were transported - there
is usually a lot of text files that you can just drop once in three
years. Among the remaining ones the disambiguation was usually easy,
depending on the type of file: In letters I only used umlauts and no
brackets, whereas in programs I mostly used brackets and no umlauts.
Only few programs contained both brackets and umlauts, and I had to do
the fixup by hand, usually the next time I needed the particular
program.

So it is a minor annoyance over the time of a few months, but by far
not the costs that you are estimating.

Bruno


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise