On 26 May 2004 04:41:00 -0700, matthew@xxxxxxxxxxxxxxxxxxxx (Matthew
Wilson) wrote:
>Are there any issues using Mozilla DOM Parsing or Serialization and
>character encodings?
>
>For example, using Cyrillic characters via numeric entity references
>(I hope I have that term correct) in an ISO-8859-1 document:
>
> var text = "Выбрать";
> var xml = "<?xml version='1.0' encoding='ISO-8859-1'?>\n<test>" +
>text + "</test>";
> window.alert (xml);
>
>This outputs the string exactly as you would expect, with the numeric
>characters intact.
>
>If I then make an XMLDocument out of the text, and print the value of
>the text node:
>
> var originalDoc = new DOMParser().parseFromString(xml, "text/xml");
> window.alert (originalDoc.documentElement.childNodes[0].data);
>
>then I see the actual Cyrillic characters. I'll paste them in here,
>but I don't know how successful that will be:
>Выбрать
Alright, that should have appeared as a string of Cyrcill characters.
>I think that's still correct.
>
>Then if I serialize it back to a string and print the result, the
>numeric character entity references are gone, I still have the
>Cyrillic characters, and the XML declaration still has an encoding of
>ISO-8859-1:
>
> var serializedDoc = new XMLSerializer().serializeToString
>(originalDoc);
> window.alert (serializedDoc);
>
><?xml version="1.0" encoding="ISO-8859-1"?>
><test>Выбрать</test>
>
>That looks wrong to me. Any opinions?
OK, answering my own question: serializeToString only promises to
return the serialization "in the form of a Unicode string", that is,
it doesn't mind if it includes illegal characters (according to the
declared character encoding).
And I should use serializeToStream with the specified encoding if I
want to make sure that any improper characters are suitably escaped.
Is that the intention?
If so, what's the easiest way of getting a scriptable output stream
which I can use for serializeToStream? I just want to store the output
in memory.
At the moment I have the following ugly hack:
var str = Components.classes["@mozilla.org/supports-string;1"].
createInstance(Components.interfaces.nsISupportsString);
var outputStream = {
write: function(buf, count) {
str.data = str.data + buf;
return count;
}
};
serializer.serializeToStream (transformedDocument, outputStream,
"UTF-8");
but I would rather have a real nsIOutputStream implementation.
--
Matthew
|