logo       

Re: Problems with DOM Parsing/Serialization and character encodings?: msg#00027

Subject: Re: Problems with DOM Parsing/Serialization and character encodings?
On 26 May 2004 04:41:00 -0700, matthew@xxxxxxxxxxxxxxxxxxxx (Matthew
Wilson) wrote:

>Are there any issues using Mozilla DOM Parsing or Serialization and
>character encodings?
>
>For example, using Cyrillic characters via numeric entity references
>(I hope I have that term correct) in an ISO-8859-1 document:
>
>   var text = "Выбрать";
>   var xml = "<?xml version='1.0' encoding='ISO-8859-1'?>\n<test>" +
>text + "</test>";
>   window.alert (xml);
>
>This outputs the string exactly as you would expect, with the numeric
>characters intact.
>
>If I then make an XMLDocument out of the text, and print the value of
>the text node:
>
>   var originalDoc = new DOMParser().parseFromString(xml, "text/xml");
>   window.alert (originalDoc.documentElement.childNodes[0].data);
>
>then I see the actual Cyrillic characters. I'll paste them in here,
>but I don't know how successful that will be:
>&#1042;&#1099;&#1073;&#1088;&#1072;&#1090;&#1100;

Alright, that should have appeared as a string of Cyrcill characters.

>I think that's still correct.
>
>Then if I serialize it back to a string and print the result, the
>numeric character entity references are gone, I still have the
>Cyrillic characters, and the XML declaration still has an encoding of
>ISO-8859-1:
>
>   var serializedDoc = new XMLSerializer().serializeToString
>(originalDoc);
>   window.alert (serializedDoc);
>
><?xml version="1.0" encoding="ISO-8859-1"?>
><test>&#1042;&#1099;&#1073;&#1088;&#1072;&#1090;&#1100;</test>
>
>That looks wrong to me. Any opinions?

OK, answering my own question: serializeToString only promises to
return the serialization "in the form of a Unicode string", that is,
it doesn't mind if it includes illegal characters (according to the
declared character encoding).

And I should use serializeToStream with the specified encoding if I
want to make sure that any improper characters are suitably escaped.

Is that the intention?

If so, what's the easiest way of getting a scriptable output stream
which I can use for serializeToStream? I just want to store the output
in memory.

At the moment I have the following ugly hack:
    var str = Components.classes["@mozilla.org/supports-string;1"].

createInstance(Components.interfaces.nsISupportsString);
    var outputStream = {
        write: function(buf, count) {
           str.data = str.data + buf;
           return count;
       }
    };
    serializer.serializeToStream (transformedDocument, outputStream,
"UTF-8");

but I would rather have a real nsIOutputStream implementation.

-- 
Matthew


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
web.pylons.gene...    hurd.l4/2002-10...    kernel.commits....    user-groups.lin...    yellowdog.gener...    java.drools.use...    security.openva...    package-managem...    linux.debian.us...    qnx.openqnx.dev...    genealogy.gramp...    file-systems.if...    voip.wengophone...    tex.context/200...    ietf.smime/2003...    audio.csound.de...    culture.region....    xfree86.devel/2...    mobile.kannel.u...    distributed.con...    education.engli...    org.user-groups...    bug-tracking.gn...    recreation.bicy...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe