logo       

Re: Last Call comments on IRI - 3.1 Mapping of IRIs to URIs: msg#00036

org.w3c.tag

Subject: Re: Last Call comments on IRI - 3.1 Mapping of IRIs to URIs


There is also the following Note:

Note: The difference between Variants B and C in Step 1 (Variant B
using normalization with NFC while Variant C not using any
normalization) is to account for the fact that in many non-Unicode
character encodings, some text cannot be represented directly.
For example, Vietnam is natively written "Việt Nam"
(containing a LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW
in NFC, but a direct transcoding from the windows-1258 character
encoding leads to "Việt Nam" (containing a LATIN SMALL
LETTER E WITH CIRCUMFLEX followed by a COMBINING DOT BELOW),
whereas direct transcoding of other 8-bit encodings of Vietnamese
may lead to other representations.

Would moving this closer to the A/B/C variants, and maybe adding
some text, be a solution to your last call comment?

Regards, Martin.


At 14:50 04/08/18 +0900, Martin Duerst wrote:

Hello Chris,

Many thanks for your comment. I have made it issue why-not-normalize-42
(see http://www.w3.org/International/iri-edit#why-not-normalize-42).

A few ideas on how to deal with it below.

At 22:22 04/08/11 +0200, Chris Lilley wrote:

Hello ,

> If the IRI is in an Unicode-based character encoding (for example
> UTF-8 or UTF-16): Do not normalize. Apply Step 2 directly to the
> encoded Unicode character sequence.

I believe that I understand why this step says 'do not normalize'
(otherwise, certain Unicode strings couldnever be used in query parts,
for example).

However, as the two preceding steps say 'normalize' and this step says
'do not normalize' the reader could be confused - or perhaps consider it
an 'obvious error'.

Do not tease the reader like this. Please explain *why* at this stage no
normalization is performed.

You definitely have a point. But as you have noticed, the explanations
are already given elsewhere in the document. I think there are several
things that can be done:

- capitalize 'NOT', to make clear that this is not an 'obvious error'.
- add a pointer to 5.3 Normalization

(http://www.w3.org/International/iri-edit/draft-duerst-iri.html#normaliza (http://www.w3.org/International/iri-edit/draft-duerst-iri.html#normalization)
- do both of the above

Which one do you prefer? Do you think this is enough, or do you have
some other idea (actual wording preferred)?


Regards, Martin.





<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise