osdir.com
mailing list archive

Subject: Re: UTF-8 curses - msg#00065

List: internationalization.linux

Date: Prev Next Index Thread: Prev Next Index
Edmund GRIMLEY EVANS wrote on 1999-10-24 14:45 UTC:
> So it would be useful for me to have a better idea of when and with
> what probability characters with more than 16/24 bits might be useful
> in the context of curses. Thanks for any clues.

I'd summarize the situation as follows:

- There will pretty certainly never any UCS characters above 0x10ffff
be used, so you should be very comfortable with only reserving
21 bits for a UCS character (leaves 11 bits for other attributes)

- The characters that will go above 0xffff will mostly be found on clay
tablets in the British Museum with amazingly low Carbon-14 concentrations.
Plane 1+ characters are more reserved codes for special applications
(most notably scholarly word processing) that are good to have
available in a good publishing system, but that are much less likely
to be urgently needed in simple VT100-style curses applications.
You won't have hieroglyphs in the X11 fixed font any time soon.

So if there is a way to add support for non-BMP characters easily, then
no harm is done by doing it, but if it involves a lot of effort or use
of resources, I'd put it quite low on the priority list for curses.

Markus

--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/




Was this page helpful?
Yes No
Thread at a glance:

Previous Message by Date: click to view message preview

Re: UTF-8 curses

> I now have mutt and slang more or less working in UTF-8, but I want to > get the interface right between the two. > > How should curses be extended to Unicode? > > Mutt uses slang's curses-compatible functions. I changed none of the > function prototypes: addch and addstr and friends all take UTF-8. (It > would have been harder to modify mutt if I hadn't allowed a single > character to be delivered by multiple calls to addch.) You could > presumably have additional functions addwch, addwstr, etc for wide > characters, if you wanted. XSI Curses defines functions for wide characters - ncurses doesn't implement much of that because (when I was working on that aspect before) I had no way to display. Now that I'm "done" with ncurses 5.0, I'll be resuming some work on the wide-character code. For a first pass, I'm really only thinking to implement what X/Open describes, then see if there's useful extensions. > But how should one switch the library into UTF-8 mode? You could have > an additional function for this, but is it possible or desirable to > avoid having an extra function? Without an additional function, a > program compiled for UTF-8-curses could still run, in non-UTF-8-mode, > with an older version of curses. Or is this easy to achieve with weak > symbols anyway? the narrow character version would simply run as a subset of the library. > Double-width chars: I think it's clear that these fill two character > cells, and if you overwrite one of the cells, then the other should be > replaced by a space in the same colour as the double-width character > just destroyed. A really nasty case is when you receive one of these > characters when the cursor is in the last column. that's also in the X/Open description. > This case is nasty, because a program might want to avoid wrapping > onto the next line, and perhaps even causing the screen to scroll, by > outputing UTF-8 octets while watching which column the cursor is in. > If you're on the last column, you think it's safe to continue, but > then you suddenly find you've trashed the next line, and perhaps the > whole screen because of scrolling. > > Both slang and curses allow you to adjust the line-wrapping and > scrolling behaviour, but I haven't yet investigated in detail ... > > Last question: how useful is it to allow characters with more than 16 > bits? > > It's easiest to change slang by storing the character plus attributes > (colour, etc) in a single integer, which is unsigned short at present > and can easily be extended to unsigned long. Then you have the choice > of either 24 bits character plus 8 bits colour, or perhaps 16 bits > colour and 16 bits character. If you think you might one day want 32 > bit characters, it would be wise to provide for that in the API, even > if you don't want to implement it internally immediately. you don't really have enough bits in one 32-bit word for character, colors and attributes (ncurses runs short by one bit - A_PROTECT iirc - because it supports 16 colors). slang doesn't implement as many attributes or colors, so that would work. Digital Unix uses a struct for this rather than an 32-bit number (though the struct may fit into a 64-bit word, now that I'm thinking about it). I implemented some configure-script tests for ncurses a year or so ago to put together an experimental wide-character version. (But as I've said, I had nothing to display on, and glibc2 wasn't stable enough then to do anything useful with - I had in mind using its locale support). > So it would be useful for me to have a better idea of when and with > what probability characters with more than 16/24 bits might be useful > in the context of curses. Thanks for any clues. > > Edmund > - > Linux-UTF8: i18n of Linux on all levels > Archive: http://mail.nl.linux.org/lists/ -- Thomas E. Dickey dickey@xxxxxxxxx http://www.clark.net/pub/dickey - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/

Next Message by Date: click to view message preview

Re: UTF-8 curses

There is a bit of curses support for advanced character sets documented on http://hoth.stsci.edu/man/man3XC/curses.html#sect10 See also "complex characters", "cchar_t", "wcwidth()", "non-spacing characters", and similar topics etc. in the X/Open spec, which is freely available online on http://www.UNIX-systems.org/online.html Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/

Previous Message by Thread: click to view message preview

Re: UTF-8 curses

> I now have mutt and slang more or less working in UTF-8, but I want to > get the interface right between the two. > > How should curses be extended to Unicode? > > Mutt uses slang's curses-compatible functions. I changed none of the > function prototypes: addch and addstr and friends all take UTF-8. (It > would have been harder to modify mutt if I hadn't allowed a single > character to be delivered by multiple calls to addch.) You could > presumably have additional functions addwch, addwstr, etc for wide > characters, if you wanted. XSI Curses defines functions for wide characters - ncurses doesn't implement much of that because (when I was working on that aspect before) I had no way to display. Now that I'm "done" with ncurses 5.0, I'll be resuming some work on the wide-character code. For a first pass, I'm really only thinking to implement what X/Open describes, then see if there's useful extensions. > But how should one switch the library into UTF-8 mode? You could have > an additional function for this, but is it possible or desirable to > avoid having an extra function? Without an additional function, a > program compiled for UTF-8-curses could still run, in non-UTF-8-mode, > with an older version of curses. Or is this easy to achieve with weak > symbols anyway? the narrow character version would simply run as a subset of the library. > Double-width chars: I think it's clear that these fill two character > cells, and if you overwrite one of the cells, then the other should be > replaced by a space in the same colour as the double-width character > just destroyed. A really nasty case is when you receive one of these > characters when the cursor is in the last column. that's also in the X/Open description. > This case is nasty, because a program might want to avoid wrapping > onto the next line, and perhaps even causing the screen to scroll, by > outputing UTF-8 octets while watching which column the cursor is in. > If you're on the last column, you think it's safe to continue, but > then you suddenly find you've trashed the next line, and perhaps the > whole screen because of scrolling. > > Both slang and curses allow you to adjust the line-wrapping and > scrolling behaviour, but I haven't yet investigated in detail ... > > Last question: how useful is it to allow characters with more than 16 > bits? > > It's easiest to change slang by storing the character plus attributes > (colour, etc) in a single integer, which is unsigned short at present > and can easily be extended to unsigned long. Then you have the choice > of either 24 bits character plus 8 bits colour, or perhaps 16 bits > colour and 16 bits character. If you think you might one day want 32 > bit characters, it would be wise to provide for that in the API, even > if you don't want to implement it internally immediately. you don't really have enough bits in one 32-bit word for character, colors and attributes (ncurses runs short by one bit - A_PROTECT iirc - because it supports 16 colors). slang doesn't implement as many attributes or colors, so that would work. Digital Unix uses a struct for this rather than an 32-bit number (though the struct may fit into a 64-bit word, now that I'm thinking about it). I implemented some configure-script tests for ncurses a year or so ago to put together an experimental wide-character version. (But as I've said, I had nothing to display on, and glibc2 wasn't stable enough then to do anything useful with - I had in mind using its locale support). > So it would be useful for me to have a better idea of when and with > what probability characters with more than 16/24 bits might be useful > in the context of curses. Thanks for any clues. > > Edmund > - > Linux-UTF8: i18n of Linux on all levels > Archive: http://mail.nl.linux.org/lists/ -- Thomas E. Dickey dickey@xxxxxxxxx http://www.clark.net/pub/dickey - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/

Next Message by Thread: click to view message preview

Re: UTF-8 curses

Followup to: <E11fRD6-0003Uc-00@xxxxxxxxxxxxxxxxxxx> By author: Markus Kuhn <Markus.Kuhn@xxxxxxxxxxxx> In newsgroup: linux.utf8 > > - The characters that will go above 0xffff will mostly be found on clay > tablets in the British Museum with amazingly low Carbon-14 concentrations. > Plane 1+ characters are more reserved codes for special applications > (most notably scholarly word processing) that are good to have > available in a good publishing system, but that are much less likely > to be urgently needed in simple VT100-style curses applications. > You won't have hieroglyphs in the X11 fixed font any time soon. > Plane 2 is intended for CJK characters that are not in common use. This will include personal names -- not something that people consider insignificant. -hpa -- <hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private! - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Sign up for updates to this mailing list. email:
Loading Comments...
Home | News | Patents | Sitemap | FAQ | advertise

Advertising by