|
|
Subject: Re: gettext - was: Re: ASCII and Unicode Quotation Marks - msg#00063
List: internationalization.linux
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: w3m with UTF-8 support available
Christian Weisgerber <naddy@xxxxxxxxxxxxxxxxxxxx>:
> The latest widely released version of w3m is 0.1.6. The second
> release of a comprehensive i18n patch is now available. This adds
> support for a wide range of document encodings, display character
> sets, and Unicode-based conversion between those. In particular,
> w3m-0.1.6-i18n-2 adds UTF-8 as display encoding.
I hope to look at the source myself, but can you tell us more about
UTF-8 as display encoding? How does it compare with Lynx?
Lynx tries to use UTF-8 despite a non-UTF-8-aware curses library. The
result is quite reasonable for a page that is mostly us-ascii, but not
really satisfactory for a page most in Russian, say.
To make Mutt work in UTF-8 I modified slang. The modifications are
supposed to implement a subset of the UNIX 98 curses specification,
i.e. those functions I needed for mutt. My slang modifications are
independent of mainstream slang development - I modified a rather old
version - but the patch might be useful for someone else, until the
mainstream slang does UTF-8. See www.rano.org/mutt.html
Edmund
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
Next Message by Date:
click to view message preview
Re: gettext - was: Re: ASCII and Unicode Quotation Marks
This seems to be a recipe for getting what Ulrich was referring to:
$ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc login
{enter "anoncvs" as the password}
$ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc co libc/manual
$ cd libc/manual
{"make" fails because of missing libm-err-tab.pl, move-if-change,
libm-err.texi, ... but then the following incantation does something:}
$ makeinfo --force libc.texinfo
Searching for a discussion of catgets vs gettext I eventually read, in
the section "Translation with gettext":
> The `gettext' approach has some advantages but also some
> disadvantages. Please see the GNU `gettext' manual for a detailed
> discussion of the pros and cons.
Is this it?
On the subject of gettext and UTF-8, I suggested something in
gnupg-i18n that people here might have an opinion on (or might tell me
to go and consult an anoncvs server about :-) I'll append my e-mail
...
Edmund
--- Begin Message ---
Sorry to reply to myself like this ...
> But I suppose we really ought to be thinking about wide characters and
> charset support, too; a Russian user might be using koi8-r or utf-8.
> The same problem effects lots of programs, not just gnupg ...
I've heard that soon gettext will automatically convert message
strings to the charset of the user's current locale. A single
non-ASCII character will become several octets when converted into
UTF-8 (which will be the most widely used charset soon). So any code
that wants to look at the third character of a translated string by
just doing ans[2], say, will break horribly.
Perhaps it would be useful to use a function that searches for an
exact match or an unambiguous prefix in a set of commands. For
example:
int f(char *cmd, char *cmds, int *x);
f("a", "a,cw;bx,by;cz", &x) = 0, x = 1 /* exact match in group 1 */
f("b", "a,cw;bx,by;cz", &x) = 1, x = 2 /* unambiguous prefix in group 2 */
f("c", "a,cw;bx,by;cz", &x) = 2, x = ? /* ambiguous prefix (1 or 3) */
f("d", "a,cw;bx,by;cz", &x) = 3, x = ? /* no match */
The cmds string would be translated, of course, and it shouldn't
matter if charset conversion changes the number of bytes. If a system
like this were in general use in GNU software, translators would soon
learn.
The question of whether to use the English commands as a canonical
alternative is left to individual translator teams.
Edmund
--- End Message ---
Previous Message by Thread:
click to view message preview
Re: gettext - was: Re: ASCII and Unicode Quotation Marks
This seems to be a recipe for getting what Ulrich was referring to:
$ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc login
{enter "anoncvs" as the password}
$ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc co libc/manual
$ cd libc/manual
{"make" fails because of missing libm-err-tab.pl, move-if-change,
libm-err.texi, ... but then the following incantation does something:}
$ makeinfo --force libc.texinfo
Searching for a discussion of catgets vs gettext I eventually read, in
the section "Translation with gettext":
> The `gettext' approach has some advantages but also some
> disadvantages. Please see the GNU `gettext' manual for a detailed
> discussion of the pros and cons.
Is this it?
On the subject of gettext and UTF-8, I suggested something in
gnupg-i18n that people here might have an opinion on (or might tell me
to go and consult an anoncvs server about :-) I'll append my e-mail
...
Edmund
--- Begin Message ---
Sorry to reply to myself like this ...
> But I suppose we really ought to be thinking about wide characters and
> charset support, too; a Russian user might be using koi8-r or utf-8.
> The same problem effects lots of programs, not just gnupg ...
I've heard that soon gettext will automatically convert message
strings to the charset of the user's current locale. A single
non-ASCII character will become several octets when converted into
UTF-8 (which will be the most widely used charset soon). So any code
that wants to look at the third character of a translated string by
just doing ans[2], say, will break horribly.
Perhaps it would be useful to use a function that searches for an
exact match or an unambiguous prefix in a set of commands. For
example:
int f(char *cmd, char *cmds, int *x);
f("a", "a,cw;bx,by;cz", &x) = 0, x = 1 /* exact match in group 1 */
f("b", "a,cw;bx,by;cz", &x) = 1, x = 2 /* unambiguous prefix in group 2 */
f("c", "a,cw;bx,by;cz", &x) = 2, x = ? /* ambiguous prefix (1 or 3) */
f("d", "a,cw;bx,by;cz", &x) = 3, x = ? /* no match */
The cmds string would be translated, of course, and it shouldn't
matter if charset conversion changes the number of bytes. If a system
like this were in general use in GNU software, translators would soon
learn.
The question of whether to use the English commands as a canonical
alternative is left to individual translator teams.
Edmund
--- End Message ---
Next Message by Thread:
click to view message preview
w3m with UTF-8 support available
For those who don't know it yet, w3m is a pager and text mode web
browser. Its chief advantages over lynx are the rendering of HTML
tables and (optionally) frames.
The latest widely released version of w3m is 0.1.6. The second
release of a comprehensive i18n patch is now available. This adds
support for a wide range of document encodings, display character
sets, and Unicode-based conversion between those. In particular,
w3m-0.1.6-i18n-2 adds UTF-8 as display encoding.
http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/
http://www2u.biglobe.ne.jp/~hsaka/w3m/
This is definitely work in progress.
The original author and maintainer of w3m is Akinori Ito, the
herculean i18n patch is by Hironori Sakamoto.
--
Christian "naddy" Weisgerber naddy@xxxxxxxxxxxxxxxxxxxx
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
|
|