|
Re: Next step: msg#00027ietf.apps-discuss
John C Klensin wrote: > actually more addressed to me personally than on the list. Hi, I hope the issue with the list "eating" my mails is now fixed, and try it again (minus typos, but keeping the ABNF issue because "8 vs. 6 digits" might be still unclear): ~~~ I-D ~~~ [U+NNNN] This document proposes that a specific variation on the latter SHOULD be used in protocols unless other considerations apply and explains that choice. I disagree with that proposal, more below. - BMP-form = "\u" Hex-quad - Full-form = "\U" 2*2 Hex-quad + BMP-form = %x5C.75 Hex-quad ; starting with lower case "\u" + Full-form = %x5C.55 2*2 Hex-quad ; starting with upper case "\U" You fixed something there already resulting in either four or six digits, but for a case-sensitive u vs. U you can't use "u" or "U" in ABNF. Sometimes this ABNF feature is annnoying. Looking at your fix, is that still either four or *_eight_* instead of six digits ? (e.g., in RFCs) although the U+NNNN form MAY be used when Unicode character encoding is clearly expected. It SHOULD be used for the purpose of talking _about_ Unicode points in the prose of Internet drafts and RFCs, but it SHOULD NOT be used to encode only the non-ASCII characters in Unicode strings. It MAY be used to encode complete obviously delimited Unicode strings. - This specification recommends that, in the absence of - compelling reasons to do otherwise, the Unicode code point forms be - used rather than the UTF-8 ones. There are several reasons for this, - including: + This specification recommends that, in the absence of + compelling reasons to do otherwise, the Unicode code point forms + SHOULD be used rather than the UTF-8 ones. There are several + reasons for this, including: Adding a SHOULD to 3.1, otherwise folks won't believe it. o Perl uses the form \x(NNN...). The advantage of this form is that there are explicit delimiters Indeed, hitting the important C044 point in [CharMod]. o Java uses the form \uNNNN, but can represent characters outside Plane 0 (i.e., above U+FFFF) only by the use of surrogate pairs. One of the reasons why anything with \u or \U is a non-starter, there are too many incompatible conventions in use. Codings that depend on surrogates SHOULD NOT be used. Strong ACK. o HTML and XML use the form &#xNNNN;. Like the Perl form, this form has a clear terminator, reducing ambiguity. However, it is generally considered ugly and awkward outside of its native HTML, XML, and similar contexts. IMO it is THE encoding. It's also trivial to convert files using this technique into XML. For the RFC 4646 language subtag registry I use a simple gawk script. There is one significant disadvantage of the recommended form. The No, there are more, folks will assume that it's a convention they know or a variant of U+NNNN[N[N]] with an arbitrary number of leading 0s. Nobody will use \U012345 when they can hope to get away with \U12345. should not introduce any security issues that are not present as a My objections are also security considerations, because folks will screw up with this encoding it could cause havoc. 6.1. Normative References There should be a normative reference to the [CharMod] bible, and especially to its conformance criteria C042 up to C048 starting at http://www.w3.org/TR/charmod/#C042 In theory your proposal is compatible with C044, but in practice I fear that it won't work as you expect it. I could live with e.g. "authors SHOULD either pick hex. NCRs as in XML or" (your proposal), but in fact I think that the XML-notation is much better. Frank |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Next step (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt: 00027, John C Klensin |
|---|---|
| Next by Date: | Re: Next step (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txt: 00027, Clive D.W. Feather |
| Previous by Thread: | Re: Next step (Was: I-D ACTION:draft-klensin-unicode-escapes-00.txti: 00027, Clive D.W. Feather |
| Next by Thread: | Re: Next step: 00027, Clive D.W. Feather |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |