logo       

Related Msgs: audio.musicbrai...    enbd.general/20...    ietf.idr/2002-0...    java.ant-contri...    gnu.make.genera...    qplus.devel/200...    video.freevo.cv...    os.netbsd.ports...    yellowdog.gener...    xfree86.cvs/200...    search.nutch.us...    freedesktop.xse...    programming.swi...    capabilities.ge...    telephony.pbx.a...    mail.sylpheed.c...    db.firebase.por...    boot-loaders.u-...    recreation.radi...    netbsd.bugs/200...    web.zope.plone....    user-groups.lin...   

Re: UTF-8 encoding errors are not always detected: msg#00067

Subject: Re: UTF-8 encoding errors are not always detected
Encoding detection happens when the document is opened; after that, a conversion error may have caused a well-formed error, but it cannot be identified as a charset problem.

Most likely the parser isn't detecting the non-UTF-8 characters because Java isn't. I have seen mention that you can ask Java's encoding converters to throw if they encounter invalid character sequences? Does anyone know if this is true? And if so, why doesn't Xerces do it?

Bob

DeSmet_Ringo@xxxxxxx wrote:
Maybe because the bad character is in the comment. I suspect the parser
skips everything until the closing comment tag. What happens when the bad
character is in an attribute value for example?

Ringo

-----Original Message-----
From: Berchner Matthias ICM Berlin
[mailto:matthias.berchner@xxxxxxxxxxx]
Sent: vrijdag 20 februari 2004 15:15
To: 'xerces-j-user@xxxxxxxxxxxxxx'
Subject: UTF-8 encoding errors are not always detected


Hi,

I'm using Xerces 1.4.2, unfortunally  UTF-8 coding errors are not always
detected:

Example:
--------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Project>
        <!-- für ONC -->
</Project>
--------------------------------------------

<!-- für ONC --> correponds to hex 3C 21 2D 2D 20 66 FC 72 20 4F 4E 43 20 2D 2D 3E

Non-UTF-8 character: ü <-> FC        


Kind Regards,
Matthias



Try Searching:
servers, voip, java, networking, microsoft ...
<Prev in Thread] Current Thread [Next in Thread>