Thanks for the prompt reply. Since I'm using java, and my database is MySQL,
do you know what I should use to output this character (Unicode: 0x1e)
correctly within the CDATA section?
If there's no easy solution, does that mean I have to filter out these funky
characters before outputting them in the CDATA section?
Max O Bowsher wrote:
>
> pmkwan wrote:
>> Can someone please explain why the parser is throwing this error:
>>
>> xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1e) was
>> found in the CDATA section.
>> at
>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown
>> Source)
>> at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
>> Source)
>>
>>
>> I am using <?xml version="1.0" encoding="UTF-8"?> in my xml file and I
>> set
>> my outputStreamWriter to use UTF-8 as well. The data I captured was from
>> our database and the character set is probably not UTF-8. Does that
>> matter?
>
> Yes, it does matters.
>
>> I thought the parser is not supposed to parse anything within the CDATA
>> section in the xml file. So why would this exception even happened?
>
> Bytes are parsed into characters. Characters are then parsed for XML
> markup. CDATA only inhibits the second of those two processes.
>
> i.e., CDATA sections still must contain valid data according to the
> character set of the document, and furthermore, the characters must fall
> within the subset of characters permitted in XML.
>
> There is no syntax that allows you to embed raw bytes within an XML
> document.
>
> Max.
>
>
>
>
--
View this message in context:
http://www.nabble.com/An-invalid-XML-character-%28Unicode%3A-0x1e%29-was-found-in-the-CDATA-section-tf4233631.html#a12045428
Sent from the Xerces - J - Users mailing list archive at Nabble.com.
|