logo       

Related Msgs: audio.musicbrai...    enbd.general/20...    ietf.idr/2002-0...    java.ant-contri...    gnu.make.genera...    qplus.devel/200...    video.freevo.cv...    os.netbsd.ports...    yellowdog.gener...    xfree86.cvs/200...    search.nutch.us...    freedesktop.xse...    programming.swi...    capabilities.ge...    telephony.pbx.a...    mail.sylpheed.c...    db.firebase.por...    boot-loaders.u-...    recreation.radi...    netbsd.bugs/200...    web.zope.plone....    user-groups.lin...   

RE: Problems with ISO-8859-1 and UTF-8 encodings: msg#00004

Subject: RE: Problems with ISO-8859-1 and UTF-8 encodings

Hi Inma,

 

The last line of your first block you have:

return baos.toString();

Note that when you do “toString()” on the byte array it will return a string in Java internal form, not UTF8.  I’m guessing that in your next block of code, xmlutf8 is the result of the first block.  This means that when you getBytes() from it, you are getting bytes that are no longer in UTF8 form.

 

HTH,

 

From: Inma Marín López [mailto:inma@xxxxxxxxx]
Sent: Thursday, August 02, 2007 12:53 AM
To: j-users@xxxxxxxxxxxxxxxxx
Subject: Problems with ISO-8859-1 and UTF-8 encodings

 

Hi all,

 

 I have some problems with ISO-5589-1 and UTF-8 encodings in XML documents. Concretely, I have this ISO-8859-1 - encoded XML document:

 

<?xml version="1.0" encoding="ISO-8859-1"?>

<DOCUMENTO>

<PERFILES>Á</PERFILES>

<PERFILES>É</PERFILES>

<PERFILES>Í</PERFILES>

<PERFILES>Ó</PERFILES>

<PERFILES>Ú</PERFILES>

</DOCUMENTO>

 

Then I UTF-8 - encode it, by means of the following piece of code:

 

            Transformer transformer = TransformerFactory.newInstance().newTransformer();

            StreamSource ds = new StreamSource(new ByteArrayInputStream(xmliso88191.getBytes()));

            transformer.setOutputProperty(OutputKeys.ENCODING,"utf-8");

            ByteArrayOutputStream baos = new ByteArrayOutputStream();

            transformer.transform(ds,new StreamResult(baos));

            return baos.toString();

 

to obtain this XML document:

 

<?xml version="1.0" encoding="utf-8"?>

<DOCUMENTO>

<PERFILES>Ã?</PERFILES>

<PERFILES>É</PERFILES>

<PERFILES>Ã?</PERFILES>

<PERFILES>Ó</PERFILES>

<PERFILES>Ú</PERFILES>

</DOCUMENTO>

 

Next, I ISO-8859-1- encode this document (UTF-8 encoded):

 

            Transformer transformer = TransformerFactory.newInstance().newTransformer();

            StreamSource ds = new StreamSource(new ByteArrayInputStream(xmlutf8.getBytes()));

            transformer.setOutputProperty(OutputKeys.ENCODING,"iso-8859-1");

            ByteArrayOutputStream baos = new ByteArrayOutputStream();

            transformer.transform(ds,new StreamResult(baos));

            return baos.toString();

 

But I can not get it. Instead, I obtain the following exception:

 

[Fatal Error] :8:11: Invalid byte 2 of 2-byte UTF-8 sequence.

javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: Invali

 byte 2 of 2-byte UTF-8 sequence.

        at org.apache.xalan.transformer.TransformerIdentityImpl.transform(Trans

ormerIdentityImpl.java:449)

        at codificacion.PruebasCodificacion.encodeISO88891(PruebasCodificacion.

ava:302)

        at codificacion.PruebasCodificacion.prueba(PruebasCodificacion.java:73)

        at codificacion.PruebasCodificacion.main(PruebasCodificacion.java:356)

Caused by: org.xml.sax.SAXParseException: Invalid byte 2 of 2-byte UTF-8 sequen

e.

        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

        at org.apache.xalan.transformer.TransformerIdentityImpl.transform(Trans

ormerIdentityImpl.java:432)

 

 

Is this process correct? Supposing that it is, it seems the exception is due to ‘Ã?’ characters  (‘Á’ and ‘Í’ UTF-8 – encoding), so I would like to know how I could UTF-8 - encode ‘Á’ and ‘Í’ characters and then, back them to ISO-8859-1 encoding.

 

Could anybody be so kind as to help me, please?

 

Thank you very much in advance.

Inma.

 


Try Searching:
servers, voip, java, networking, microsoft ...
<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

Home | blog view | USPTO Patent Archive (NEW!) | advertise | OSDir is an inevitable website. super tiny logo