|
|
Subject: Re: UTF-8 curses - msg#00065
List: internationalization.linux
Edmund GRIMLEY EVANS wrote on 1999-10-24 14:45 UTC:
> So it would be useful for me to have a better idea of when and with
> what probability characters with more than 16/24 bits might be useful
> in the context of curses. Thanks for any clues.
I'd summarize the situation as follows:
- There will pretty certainly never any UCS characters above 0x10ffff
be used, so you should be very comfortable with only reserving
21 bits for a UCS character (leaves 11 bits for other attributes)
- The characters that will go above 0xffff will mostly be found on clay
tablets in the British Museum with amazingly low Carbon-14 concentrations.
Plane 1+ characters are more reserved codes for special applications
(most notably scholarly word processing) that are good to have
available in a good publishing system, but that are much less likely
to be urgently needed in simple VT100-style curses applications.
You won't have hieroglyphs in the X11 fixed font any time soon.
So if there is a way to add support for non-BMP characters easily, then
no harm is done by doing it, but if it involves a lot of effort or use
of resources, I'd put it quite low on the priority list for curses.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: < http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
Was this page helpful?
Thread at a glance:
Previous Message by Date:
click to view message preview
Re: UTF-8 curses
> I now have mutt and slang more or less working in UTF-8, but I want to
> get the interface right between the two.
>
> How should curses be extended to Unicode?
>
> Mutt uses slang's curses-compatible functions. I changed none of the
> function prototypes: addch and addstr and friends all take UTF-8. (It
> would have been harder to modify mutt if I hadn't allowed a single
> character to be delivered by multiple calls to addch.) You could
> presumably have additional functions addwch, addwstr, etc for wide
> characters, if you wanted.
XSI Curses defines functions for wide characters - ncurses doesn't implement
much of that because (when I was working on that aspect before) I had no
way to display. Now that I'm "done" with ncurses 5.0, I'll be resuming
some work on the wide-character code. For a first pass, I'm really only
thinking to implement what X/Open describes, then see if there's useful
extensions.
> But how should one switch the library into UTF-8 mode? You could have
> an additional function for this, but is it possible or desirable to
> avoid having an extra function? Without an additional function, a
> program compiled for UTF-8-curses could still run, in non-UTF-8-mode,
> with an older version of curses. Or is this easy to achieve with weak
> symbols anyway?
the narrow character version would simply run as a subset of the library.
> Double-width chars: I think it's clear that these fill two character
> cells, and if you overwrite one of the cells, then the other should be
> replaced by a space in the same colour as the double-width character
> just destroyed. A really nasty case is when you receive one of these
> characters when the cursor is in the last column.
that's also in the X/Open description.
> This case is nasty, because a program might want to avoid wrapping
> onto the next line, and perhaps even causing the screen to scroll, by
> outputing UTF-8 octets while watching which column the cursor is in.
> If you're on the last column, you think it's safe to continue, but
> then you suddenly find you've trashed the next line, and perhaps the
> whole screen because of scrolling.
>
> Both slang and curses allow you to adjust the line-wrapping and
> scrolling behaviour, but I haven't yet investigated in detail ...
>
> Last question: how useful is it to allow characters with more than 16
> bits?
>
> It's easiest to change slang by storing the character plus attributes
> (colour, etc) in a single integer, which is unsigned short at present
> and can easily be extended to unsigned long. Then you have the choice
> of either 24 bits character plus 8 bits colour, or perhaps 16 bits
> colour and 16 bits character. If you think you might one day want 32
> bit characters, it would be wise to provide for that in the API, even
> if you don't want to implement it internally immediately.
you don't really have enough bits in one 32-bit word for character, colors
and attributes (ncurses runs short by one bit - A_PROTECT iirc - because it
supports 16 colors). slang doesn't implement as many attributes or colors,
so that would work. Digital Unix uses a struct for this rather than an
32-bit number (though the struct may fit into a 64-bit word, now that I'm
thinking about it).
I implemented some configure-script tests for ncurses a year or so ago to
put together an experimental wide-character version. (But as I've said, I
had nothing to display on, and glibc2 wasn't stable enough then to do anything
useful with - I had in mind using its locale support).
> So it would be useful for me to have a better idea of when and with
> what probability characters with more than 16/24 bits might be useful
> in the context of curses. Thanks for any clues.
>
> Edmund
> -
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/lists/
--
Thomas E. Dickey
dickey@xxxxxxxxx
http://www.clark.net/pub/dickey
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
Next Message by Date:
click to view message preview
Re: UTF-8 curses
There is a bit of curses support for advanced character sets documented
on
http://hoth.stsci.edu/man/man3XC/curses.html#sect10
See also "complex characters", "cchar_t", "wcwidth()", "non-spacing
characters", and similar topics etc. in the X/Open spec, which is freely
available online on
http://www.UNIX-systems.org/online.html
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
Previous Message by Thread:
click to view message preview
Re: UTF-8 curses
> I now have mutt and slang more or less working in UTF-8, but I want to
> get the interface right between the two.
>
> How should curses be extended to Unicode?
>
> Mutt uses slang's curses-compatible functions. I changed none of the
> function prototypes: addch and addstr and friends all take UTF-8. (It
> would have been harder to modify mutt if I hadn't allowed a single
> character to be delivered by multiple calls to addch.) You could
> presumably have additional functions addwch, addwstr, etc for wide
> characters, if you wanted.
XSI Curses defines functions for wide characters - ncurses doesn't implement
much of that because (when I was working on that aspect before) I had no
way to display. Now that I'm "done" with ncurses 5.0, I'll be resuming
some work on the wide-character code. For a first pass, I'm really only
thinking to implement what X/Open describes, then see if there's useful
extensions.
> But how should one switch the library into UTF-8 mode? You could have
> an additional function for this, but is it possible or desirable to
> avoid having an extra function? Without an additional function, a
> program compiled for UTF-8-curses could still run, in non-UTF-8-mode,
> with an older version of curses. Or is this easy to achieve with weak
> symbols anyway?
the narrow character version would simply run as a subset of the library.
> Double-width chars: I think it's clear that these fill two character
> cells, and if you overwrite one of the cells, then the other should be
> replaced by a space in the same colour as the double-width character
> just destroyed. A really nasty case is when you receive one of these
> characters when the cursor is in the last column.
that's also in the X/Open description.
> This case is nasty, because a program might want to avoid wrapping
> onto the next line, and perhaps even causing the screen to scroll, by
> outputing UTF-8 octets while watching which column the cursor is in.
> If you're on the last column, you think it's safe to continue, but
> then you suddenly find you've trashed the next line, and perhaps the
> whole screen because of scrolling.
>
> Both slang and curses allow you to adjust the line-wrapping and
> scrolling behaviour, but I haven't yet investigated in detail ...
>
> Last question: how useful is it to allow characters with more than 16
> bits?
>
> It's easiest to change slang by storing the character plus attributes
> (colour, etc) in a single integer, which is unsigned short at present
> and can easily be extended to unsigned long. Then you have the choice
> of either 24 bits character plus 8 bits colour, or perhaps 16 bits
> colour and 16 bits character. If you think you might one day want 32
> bit characters, it would be wise to provide for that in the API, even
> if you don't want to implement it internally immediately.
you don't really have enough bits in one 32-bit word for character, colors
and attributes (ncurses runs short by one bit - A_PROTECT iirc - because it
supports 16 colors). slang doesn't implement as many attributes or colors,
so that would work. Digital Unix uses a struct for this rather than an
32-bit number (though the struct may fit into a 64-bit word, now that I'm
thinking about it).
I implemented some configure-script tests for ncurses a year or so ago to
put together an experimental wide-character version. (But as I've said, I
had nothing to display on, and glibc2 wasn't stable enough then to do anything
useful with - I had in mind using its locale support).
> So it would be useful for me to have a better idea of when and with
> what probability characters with more than 16/24 bits might be useful
> in the context of curses. Thanks for any clues.
>
> Edmund
> -
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/lists/
--
Thomas E. Dickey
dickey@xxxxxxxxx
http://www.clark.net/pub/dickey
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
Next Message by Thread:
click to view message preview
Re: UTF-8 curses
Followup to: <E11fRD6-0003Uc-00@xxxxxxxxxxxxxxxxxxx>
By author: Markus Kuhn <Markus.Kuhn@xxxxxxxxxxxx>
In newsgroup: linux.utf8
>
> - The characters that will go above 0xffff will mostly be found on clay
> tablets in the British Museum with amazingly low Carbon-14 concentrations.
> Plane 1+ characters are more reserved codes for special applications
> (most notably scholarly word processing) that are good to have
> available in a good publishing system, but that are much less likely
> to be urgently needed in simple VT100-style curses applications.
> You won't have hieroglyphs in the X11 fixed font any time soon.
>
Plane 2 is intended for CJK characters that are not in common use.
This will include personal names -- not something that people consider
insignificant.
-hpa
--
<hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private!
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/
|
|