osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Putting Unicode characters in JSON


On 03/22/2018 12:46 PM, Tobiah wrote:
> I have some mailing information in a Mysql database that has
> characters from various other countries.? The table says that
> it's using latin-1 encoding.? I want to send this data out
> as JSON.
> 
> So I'm just taking each datum and doing 'name'.decode('latin-1')
> and adding the resulting Unicode value right into my JSON structure
> before doing .dumps() on it.? This seems to work, and I can consume
> the JSON with another program and when I print values, they look nice
> with the special characters and all.
> 
> I was reading though, that JSON files must be encoded with UTF-8.? So
> should I be doing string.decode('latin-1').encode('utf-8')?? Or does
> the json module do that for me when I give it a unicode object?


Thanks for all the discussion.  A little more about our setup:
We have used a LAMP stack system for almost 20 years to deploy
hundreds of websites.  The database tables are latin-1 only because
at the time we didn't know how or care to change it.

More and more, 'special' characters caused a problem.  They would
not come out correctly in a .csv file or wouldn't print correctly.
Lately, I noticed that a JSON file I was sending out was delivering
unreadable characters.  That's when I started to look into Unicode
a bit more.  From the discussion, and my own guesses, it looks
as though all have to do is string.decode('latin-1'), and stuff
that Unicode object right into my structure that gets handed to
json.dumps().

If I changed my database tables to all be UTF-8 would this
work cleanly without any decoding?  Whatever people are doing
to get these characters in, whether it's foreign keyboards,
or fancy escape sequences in the web forms, would their intended
characters still go into the UTF-8 database as the proper characters?
Or now do I have to do a conversion on the way in to the database?

We also get import data that often comes in .xlsx format.  What
encoding do I get when I dump a .csv from that?  Do I have to
ask the sender?  I already know that they don't know.


Toby