logo       

Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt: msg#00043

ietf.apps-discuss

Subject: Re: New draft (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt

Pardon me for being late to this party, I was on vacation in
Australia. I think this is a positive contribution.

First, a detail point: In section 5.4, it's probably relevant that
per the Java Language Specification
(http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#95413p)
it's clear that a Java character literal or variable represents, not a
Unicode character, but a UTF-16 code point. I guess the conclusion
is that it may be OK in certain circumstances to use \uNNNN, but it's
not OK to explain that by calling out to Java.

Second: I think that the discussion shows that the syntax problems
around representing Unicode characters in ASCII and other
Unicode-oblivious texts are tricky; witness the issues with delimiters
and ABNF/case. This is further evidence, were any needed, that IETF
Working Groups SHOULD NOT specify Internet protocols which may be used
to transfer text but are not capable of representing the Unicode
character set, either by specifying the use of either hard-wired UTF-8
or alternatively XML, both of which have cracked this nut.

So here's a proposed recasting of second para of 1.1:

When one moves to Unicode [Unicode] [ISO10646], where characters
occupy two or more octets and may be coded in several different
forms, the question of escapes becomes even more complicated. In
particular, we have seen fairly extensive use of both hexadecimal
representations of the UTF-8 encoding [RFC3629] of a character and
variations on the U+NNNN[N[N]] notation commonly used in conjunction
with the Unicode Standard.

New protocols that are required to carry textual content SHOULD be designed
in such a way that the full repertoire of Unicode characters may be
represented
in that text; UTF-8 and XML are both good options.

This document proposes that existing protocols being internationalized SHOULD
use some contextually-appropriate variation of the U+NNNN[N[N]]
notation unless
other considerations outweigh those described here.




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise