logo       
Google Custom Search
    AddThis Social Bookmark Button

Split CDATA Sections and the division Symbol (x00f7): msg#00014

Subject: Split CDATA Sections and the division Symbol (x00f7)
I'm trying to figure out if this is a bug or not. I created a DOM with an element with a CDATA section and I set the value to a String of characters which include a division symbol (xF7). (I actually do this by reading the characters in from a file and converting them from bytes to a String specifying a Windows-1252 encoding.) When I serialize this DOM out to a String, byte array or anything else, the CData section is split around the division symbol and the division symbol is emitted as an entity (÷). I do try to serialize this as UTF-8. 
 
I see in the documentation that this is the correct behavior when the serializer encounters a Unicode character that isn't recognized; not sure if this means not recognized in the Unicode (internal) form or there is no UTF-8 equivalent. But x00F7 seems to be the correct Unicode value for a division symbol and there is a UTF-8 encoding for it.  Other "special" characters seem to serialize to UTF-8 without this split.
 
I can send code. I've tried this on the latest Xerces-J. Anyone have any thoughts about it?
 
Thanks,
 
Steve Carton

Try Searching:
servers, voip, java, networking, microsoft ...
<Prev in Thread] Current Thread [Next in Thread>