|
Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn): msg#00347text.unicode.devel
John Hudson <tiro at tiro dot com> wrote: >> No Dutchman - whether he is involved in type or not - can be amazed >> by the existence of IJ. > > No one is amazed that it exists as a grapheme, but my Dutch colleagues > are frequently surprised to discover that it is a *character* in > Unicode, and they wonder why. Perhaps this is one of those characters > that needs its story told: I've heard that it was encoded for > backwards compatibility with an existing standard, but no one I've > asked seems to know which standard, or whether this standard is still > in use by anyone. The standard is ISO/IEC 6937. First developed in the early 1980s, this was a supplementary set of 96 code points intended for use in conjunction with ISO 646 (ASCII) to cover as many European languages as possible, within the ISO 2022 framework. It featured a set of non-spacing diacritical marks, the forerunners of Unicode's combining marks, although they appeared before the base letter instead of after it as in Unicode, and were not considered characters in their own right. About 330 characters could be encoded when all the combining marks were taken into account. ISO 6937 had some significant drawbacks that prevented its widespread deployment, at least in North America. The combining marks could only be used in certain prescribed combinations (a with acute was legal but g with acute was not), and only one combining mark per base letter was allowed, which made ISO 6937 useless for languages like Vietnamese that require multiple diacritics. Furthermore, because it lived in the ISO 2022 world, ISO 6937 had to be "announced" via an escape sequence. And of course, there was the usual resistance to encoding a single character like á with a two-byte sequence. ISO 6937 never achieved great popularity, although I have heard it saw some use in the Netherlands. The capital IJ digraph is encoded at position 6/6 in ISO 6937, which means it would normally be expressed with byte 0xE6 (assuming 6937 was defined as the G1 or "high-bit" character set). The small ij digraph is encoded at 7/6 (0xF6). -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn), Michael \(michka\) Kaplan |
|---|---|
| Next by Date: | RE: Languages A-Z, Marco Cimarosti |
| Previous by Thread: | Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn), John Hudson |
| Next by Thread: | RE: [OT] multilingual support in MS products (was Re: Kurdish ghayn), Thomas Pottjegort |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |