|
RE: Character encodings: msg#00235text.xml.exist
Hi, I don't know if this is the same problem but it might help. I've also noticed a problem with UTF-8 characters and narrowed it down to how the document is stored in eXist. I noticed that if I used the client connected to a remote database to store a document that the encoding was preserved, but if I used my application to store the same document, the character encoding became UTF-16 instead of UTF-8. So I started digging around into the source code and found that the client stored the document as a file, while my application stored the document using a string. Digging further I discovered that when storing the application from a file, the raw bytes were sent over the XML-RPC without first reading them into a java String object thus preserving the UTF-8 encoding, while in my application, the bytes were already transferred into a Java string. Storing the data as a file was not an option because I needed to do some string processing on the data before storing it. I found that I had to force the Java String to recognize it as a UTF-8 string by doing this: //-- content is a String previously read from a file XMLResource resource = (XMLResource)col.createResource(fileName, "XMLResource"); byte[] b = content.getBytes(); resource.setContent(new String(b, "UTF-8")); col.storeResource(resource); Now when the string is stored the UTF-8 encoding is preserved and properly transferred using the XMLRPC. --John -----Original Message----- From: exist-open-admin@xxxxxxxxxxxxxxxxxxxxx [mailto:exist-open-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Wolfgang Meier Sent: Tuesday, September 28, 2004 10:33 AM To: exist-open@xxxxxxxxxxxxxxxxxxxxx Subject: [Exist-open] Character encodings Hi, the XMLRPC library indeed seems to have a problem with character encodings. Giulio observed that collection names containing accents are messed up in the jEdit plugin. I have thus added another test to org.exist.xmlrpc.test.XmlRpcTest to check accents in collection paths. Like all other tests, it runs through on my machine. However, the test fails on Giulio's installation. It thus seems that the XMLRPC library really transcodes characters to the system default encoding at some point. I usually set my system encoding to UTF-8 on all machines, so I couldn't see the problem. We will have to figure out where and why the transcoding occurs. Wolfgang ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ Exist-open mailing list Exist-open@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/exist-open ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: Character encodings: 00235, Carsten Ziegert |
|---|---|
| Next by Date: | xmldb-library and sources: 00235, nurfuermailings |
| Previous by Thread: | Re: Character encodingsi: 00235, Carsten Ziegert |
| Next by Thread: | RE: Character encodings: 00235, ROBERT GREGORY |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |