logo       

RE: Character encodings: msg#00235

text.xml.exist

Subject: RE: Character encodings

Hi,

I don't know if this is the same problem but it might help. I've also
noticed a problem with UTF-8 characters and narrowed it down to how the
document is stored in eXist. I noticed that if I used the client
connected to a remote database to store a document that the encoding was
preserved, but if I used my application to store the same document, the
character encoding became UTF-16 instead of
UTF-8.

So I started digging around into the source code and found that the
client stored the document as a file, while my application stored the
document using a string. Digging further I discovered that when storing
the application from a file, the raw bytes were sent over the XML-RPC
without first reading them into a java String object thus preserving the
UTF-8 encoding, while in my application, the bytes were already
transferred into a Java string. Storing the data as a file was not an
option because I needed to do some string processing on the data before
storing it. I found that I had to force the Java String to recognize it
as a UTF-8 string by doing this:

//-- content is a String previously read from a file
XMLResource resource = (XMLResource)col.createResource(fileName,
"XMLResource");
byte[] b = content.getBytes();
resource.setContent(new String(b, "UTF-8"));
col.storeResource(resource);

Now when the string is stored the UTF-8 encoding is preserved and
properly transferred using the XMLRPC.

--John



-----Original Message-----
From: exist-open-admin@xxxxxxxxxxxxxxxxxxxxx
[mailto:exist-open-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Wolfgang
Meier
Sent: Tuesday, September 28, 2004 10:33 AM
To: exist-open@xxxxxxxxxxxxxxxxxxxxx
Subject: [Exist-open] Character encodings

Hi,

the XMLRPC library indeed seems to have a problem with character
encodings.
Giulio observed that collection names containing accents are messed up
in the
jEdit plugin. I have thus added another test to
org.exist.xmlrpc.test.XmlRpcTest to check accents in collection paths.
Like
all other tests, it runs through on my machine. However, the test fails
on
Giulio's installation.

It thus seems that the XMLRPC library really transcodes characters to
the
system default encoding at some point. I usually set my system encoding
to
UTF-8 on all machines, so I couldn't see the problem.

We will have to figure out where and why the transcoding occurs.

Wolfgang


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Exist-open mailing list
Exist-open@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/exist-open


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php


<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise