osdir.com
mailing list archive

Subject: Re: gettext - was: Re: ASCII and Unicode Quotation Marks - msg#00063

List: internationalization.linux

Date: Prev Next Index Thread: Prev Next Index
>>>>> Andries Brouwer writes:

[...]
> Hmm - doesnt look like I am making progress here.
Apparently not :-(
> If there is something under sourceware.cygnus.com that I should read,
> please tell exactly where and how I find it.

check revision 2.14 in:
http://sourceware.cygnus.com/cgi-bin/cvsweb.cgi/libc/manual/message.texi?cvsroot=glibc

Andreas
--
Andreas Jaeger
SuSE Labs aj@xxxxxxx
private aj@xxxxxxxxxxxxxxxxxxxxxx
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/




Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: w3m with UTF-8 support available

Christian Weisgerber <naddy@xxxxxxxxxxxxxxxxxxxx>: > The latest widely released version of w3m is 0.1.6. The second > release of a comprehensive i18n patch is now available. This adds > support for a wide range of document encodings, display character > sets, and Unicode-based conversion between those. In particular, > w3m-0.1.6-i18n-2 adds UTF-8 as display encoding. I hope to look at the source myself, but can you tell us more about UTF-8 as display encoding? How does it compare with Lynx? Lynx tries to use UTF-8 despite a non-UTF-8-aware curses library. The result is quite reasonable for a page that is mostly us-ascii, but not really satisfactory for a page most in Russian, say. To make Mutt work in UTF-8 I modified slang. The modifications are supposed to implement a subset of the UNIX 98 curses specification, i.e. those functions I needed for mutt. My slang modifications are independent of mainstream slang development - I modified a rather old version - but the patch might be useful for someone else, until the mainstream slang does UTF-8. See www.rano.org/mutt.html Edmund - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/

Next Message by Date: click to view message preview

Re: gettext - was: Re: ASCII and Unicode Quotation Marks

This seems to be a recipe for getting what Ulrich was referring to: $ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc login {enter "anoncvs" as the password} $ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc co libc/manual $ cd libc/manual {"make" fails because of missing libm-err-tab.pl, move-if-change, libm-err.texi, ... but then the following incantation does something:} $ makeinfo --force libc.texinfo Searching for a discussion of catgets vs gettext I eventually read, in the section "Translation with gettext": > The `gettext' approach has some advantages but also some > disadvantages. Please see the GNU `gettext' manual for a detailed > discussion of the pros and cons. Is this it? On the subject of gettext and UTF-8, I suggested something in gnupg-i18n that people here might have an opinion on (or might tell me to go and consult an anoncvs server about :-) I'll append my e-mail ... Edmund --- Begin Message --- Sorry to reply to myself like this ... > But I suppose we really ought to be thinking about wide characters and > charset support, too; a Russian user might be using koi8-r or utf-8. > The same problem effects lots of programs, not just gnupg ... I've heard that soon gettext will automatically convert message strings to the charset of the user's current locale. A single non-ASCII character will become several octets when converted into UTF-8 (which will be the most widely used charset soon). So any code that wants to look at the third character of a translated string by just doing ans[2], say, will break horribly. Perhaps it would be useful to use a function that searches for an exact match or an unambiguous prefix in a set of commands. For example: int f(char *cmd, char *cmds, int *x); f("a", "a,cw;bx,by;cz", &x) = 0, x = 1 /* exact match in group 1 */ f("b", "a,cw;bx,by;cz", &x) = 1, x = 2 /* unambiguous prefix in group 2 */ f("c", "a,cw;bx,by;cz", &x) = 2, x = ? /* ambiguous prefix (1 or 3) */ f("d", "a,cw;bx,by;cz", &x) = 3, x = ? /* no match */ The cmds string would be translated, of course, and it shouldn't matter if charset conversion changes the number of bytes. If a system like this were in general use in GNU software, translators would soon learn. The question of whether to use the English commands as a canonical alternative is left to individual translator teams. Edmund --- End Message ---

Previous Message by Thread: click to view message preview

Re: gettext - was: Re: ASCII and Unicode Quotation Marks

This seems to be a recipe for getting what Ulrich was referring to: $ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc login {enter "anoncvs" as the password} $ cvs -z 9 -d :pserver:anoncvs@xxxxxxxxxxxxxxxxxx:/cvs/glibc co libc/manual $ cd libc/manual {"make" fails because of missing libm-err-tab.pl, move-if-change, libm-err.texi, ... but then the following incantation does something:} $ makeinfo --force libc.texinfo Searching for a discussion of catgets vs gettext I eventually read, in the section "Translation with gettext": > The `gettext' approach has some advantages but also some > disadvantages. Please see the GNU `gettext' manual for a detailed > discussion of the pros and cons. Is this it? On the subject of gettext and UTF-8, I suggested something in gnupg-i18n that people here might have an opinion on (or might tell me to go and consult an anoncvs server about :-) I'll append my e-mail ... Edmund --- Begin Message --- Sorry to reply to myself like this ... > But I suppose we really ought to be thinking about wide characters and > charset support, too; a Russian user might be using koi8-r or utf-8. > The same problem effects lots of programs, not just gnupg ... I've heard that soon gettext will automatically convert message strings to the charset of the user's current locale. A single non-ASCII character will become several octets when converted into UTF-8 (which will be the most widely used charset soon). So any code that wants to look at the third character of a translated string by just doing ans[2], say, will break horribly. Perhaps it would be useful to use a function that searches for an exact match or an unambiguous prefix in a set of commands. For example: int f(char *cmd, char *cmds, int *x); f("a", "a,cw;bx,by;cz", &x) = 0, x = 1 /* exact match in group 1 */ f("b", "a,cw;bx,by;cz", &x) = 1, x = 2 /* unambiguous prefix in group 2 */ f("c", "a,cw;bx,by;cz", &x) = 2, x = ? /* ambiguous prefix (1 or 3) */ f("d", "a,cw;bx,by;cz", &x) = 3, x = ? /* no match */ The cmds string would be translated, of course, and it shouldn't matter if charset conversion changes the number of bytes. If a system like this were in general use in GNU software, translators would soon learn. The question of whether to use the English commands as a canonical alternative is left to individual translator teams. Edmund --- End Message ---

Next Message by Thread: click to view message preview

w3m with UTF-8 support available

For those who don't know it yet, w3m is a pager and text mode web browser. Its chief advantages over lynx are the rendering of HTML tables and (optionally) frames. The latest widely released version of w3m is 0.1.6. The second release of a comprehensive i18n patch is now available. This adds support for a wide range of document encodings, display character sets, and Unicode-based conversion between those. In particular, w3m-0.1.6-i18n-2 adds UTF-8 as display encoding. http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/ http://www2u.biglobe.ne.jp/~hsaka/w3m/ This is definitely work in progress. The original author and maintainer of w3m is Akinori Ito, the herculean i18n patch is by Hironori Sakamoto. -- Christian "naddy" Weisgerber naddy@xxxxxxxxxxxxxxxxxxxx - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by