logo       

C string literals with 16-bit Unicode: msg#00413

text.unicode.devel

Subject: C string literals with 16-bit Unicode

Hi all, I am wondering how developers get 16-bit string *literals* into C source code. Do you use a mechanism other than the following?

In the following, I use UChar as an example typedef name for the type of 16-bit Unicode strings (usually same as unsigned short).

Escapes for non-ASCII characters would be ok. UTF-8/16 for the source code would be nicer. Whatever mechanism has to work on a non-ASCII platform, too.

I am aware that there is an effort under way to add 16-bit Unicode string literals to the C standard; I am looking for what can be done today.

I know of

a) array of numeric constants
const UChar string[]={ 0x61, 0x62, 0x20ac };

b) array of numeric constants expressed as named constants
enum { _a=0x61, _b, _c, ..., _Euro=0x20ac, ... };
const UChar string[]={ _a, _b, _Euro };

c) on some lucky platforms with 16-bit-Unicode wchar_t, simply
const UChar *string=L"ab\x20ac";
or even
const UChar *string=L"ab€";

-> but this is not portable

d) using a preprocessor which takes source code like
const UChar *string=U16LITERAL("ab\u20ac");
or
const UChar *string=U16LITERAL("ab€");
and generates output C source code like a) or c) as appropriate

-> Are there such preprocessors available?
I guess Perl could do this...

e) using a tool as in d) but only per-string for the developer,
where one can type "ab€" and the tool generates output
text like in a) to copy-paste into the .c file,
possibly with a comment containing the original string


I am *not* looking for ways to get strings via more high-level mechanisms and
runtime functions like

z1) not using string literals but resource bundles/message catalogs etc.

z2) using an unescape function
const UChar *string=unescape("ab\\u20ac");

etc.

Tips are greatly appreciated.

markus





<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise