|
Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn): msg#00350text.unicode.devel
Thomas Milo wrote on 04/27/2003 04:49:26 AM: > Would it be possible to make the IJ/ij available at last as a single > character IJ/ij for Dutch users? If I understand the facts correctly, is this not just a digraph, comparable to "ch" in various languages, the only difference being that Unicode doesn't have a "ch" character but did include "ij" -- for backward compatibility purposes? In other words, ideally "ij" wouldn't have been included, but now that we've got it, Dutch "ij" has two alternate representations, < i, j > or < U+0133 >. Tom, I think what you should be asking Chris Pratley to do is to make the spelling checker for Office recognise either spelling; the best way to do that is probably to apply a compatibility normalisation to Dutch text.# As for input methods, Michael Kaplan has already pointed out that they can't really change what has already shipped (and that that is not an Office issue). There are ways to create your own input method, though: you can use Tavultesoft Keyman now to create your own input method, or soon (I presume) Microsoft will be making a tool available. #This brings up a general issue worth mentioning: we are familiar with the concept of canonical equivalence for Latin precomposed / decomposed representations, and the use of Unicode normalisation forms C and D to deal with these equivalences. In contrasts, characters with compatibility decompositions are quite a sorted lot, and there's no simple, general rule to say when compatibility decompositions should or shouldn't be used. But, there is one class of Latin characters with compatibility decompositions that probably should generally be handled as though they were canonically equivent to their decomposed counterparts: digraphs. For whatever reason, digraphs as a rule were given *compatibility* rather than *canonical* decomposition mappings. But unless I'm missing something, it seems to me that for most practical purposes, representations using the digraph characters ij, lj, dž etc. should be treated by applications as equivalent with their decomposed counterparts. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
|
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Adobe GoLive 6 & Unicode, Rob Wilder |
|---|---|
| Next by Date: | Arabic text in Unicode hexadecimal code, Sheni R. Meledath |
| Previous by Thread: | Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn), Michael \(michka\) Kaplan |
| Next by Thread: | RE: [OT] multilingual support in MS products (was Re: Kurdish ghayn), Bob_Hallissy |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |