logo       

Re: An invalid XML character (Unicode: 0x1e) was found in the CDATA section: msg#00008

Subject: Re: An invalid XML character (Unicode: 0x1e) was found in the CDATA section
Thanks for the prompt reply.  Since I'm using java, and my database is MySQL,
do you know what I should use to output this character (Unicode: 0x1e)
correctly within the CDATA section?

If there's no easy solution, does that mean I have to filter out these funky
characters before outputting them in the CDATA section?



Max O Bowsher wrote:
> 
> pmkwan wrote:
>> Can someone please explain why the parser is throwing this error:
>> 
>> xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1e) was
>> found in the CDATA section.
>>      at
>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
>> Source)
>>      at
>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown
>> Source)
>>      at
>> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown
>> Source)
>>      at
>> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown
>> Source)
>>      at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(Unknown
>> Source)
>>      at
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
>> Source)
>> 
>> 
>> I am using <?xml version="1.0" encoding="UTF-8"?> in my xml file and I
>> set
>> my outputStreamWriter to use UTF-8 as well.  The data I captured was from
>> our database and the character set is probably not UTF-8.  Does that
>> matter?
> 
> Yes, it does matters.
> 
>> I thought the parser is not supposed to parse anything within the CDATA
>> section in the xml file.  So why would this exception even happened?
> 
> Bytes are parsed into characters. Characters are then parsed for XML
> markup. CDATA only inhibits the second of those two processes.
> 
> i.e., CDATA sections still must contain valid data according to the
> character set of the document, and furthermore, the characters must fall
> within the subset of characters permitted in XML.
> 
> There is no syntax that allows you to embed raw bytes within an XML
> document.
> 
> Max.
> 
> 
>  
> 

-- 
View this message in context: 
http://www.nabble.com/An-invalid-XML-character-%28Unicode%3A-0x1e%29-was-found-in-the-CDATA-section-tf4233631.html#a12045428
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Recently Viewed:
boot-loaders.gr...    php.pear.genera...    debugging.valgr...    kde.redhat.user...    text.xml.xsl.ge...    culture.languag...    hardware.microc...    java.servicemix...    redhat.release....    web.zope.plone....    user-groups.lin...    opendarwin.webk...    video.mjpeg.use...    sysutils.bcfg2....    encryption.gpg....    lx-office.devel...    xfree86.forum/2...    mail.mutt.devel...    acpi.devel/2003...    qnx.openqnx.dev...    network.irc.irs...    freebsd.devel.m...   
Home | blog view | USPTO Patent Archive | advertise | OSDir is an inevitable website. super tiny logo

Free Magazines

Cisco News
Receive a free quarterly e-newsletter with exclusive articles on how Cisco IT uses its own products and solutions to enable the business.
subscribe

Systems Management News, the newspaper for IT systems administration and data center managers! Each issue of Systems Management News is chock-full of news and analysis to help you understand what's happening in your field.
subscribe

The Enterprise Newsweekly eWeek is the essential technology information source for builders of e-business.
subscribe

Oracle Magazine Oracle Magazine contains technology strategy articles, sample code, tips, Oracle and partner news, how to articles for developers and DBAs, and more. Oracle (NASDAQ: ORCL) is the world's largest enterprise software company.
subscribe

Total Telecom Total Telecom is "The Economist of the communications industry".
subscribe