|
I'm trying to figure
out if this is a bug or not. I created a DOM with an element with a CDATA
section and I set the value to a String of characters which include a division
symbol (xF7). (I actually do this by reading the characters in from a file and
converting them from bytes to a String specifying a Windows-1252 encoding.) When
I serialize this DOM out to a String, byte array or anything else, the
CData section is split around the division symbol and the division symbol is
emitted as an entity (÷). I do try to serialize this as
UTF-8.
I see in the
documentation that this is the correct behavior when the serializer encounters a
Unicode character that isn't recognized; not sure if this means not recognized
in the Unicode (internal) form or there is no UTF-8 equivalent. But x00F7
seems to be the correct Unicode value for a division symbol and there is a UTF-8
encoding for it. Other "special" characters seem to serialize to UTF-8
without this split.
I can send code.
I've tried this on the latest Xerces-J. Anyone have any thoughts about
it?
Thanks,
Steve
Carton
|
Try Searching:
servers, voip, java, networking, microsoft ...
|
|
|
|