[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Putting Unicode characters in JSON

On 2018-03-24 11:21:09 +1100, Chris Angelico wrote:
> On Sat, Mar 24, 2018 at 11:11 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
> > On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote:
> >> If I changed my database tables to all be UTF-8 would this work cleanly
> >> without any decoding?
> >
> > Not reliably or safely. It will appear to work so long as you have only
> > pure ASCII strings from the database, and then crash when you don't:
> >
> > py> text_from_database = u"hello w?rld".encode('latin1')
> > py> print text_from_database
> > hello w?rld
> If the database has been configured to use UTF-8 (as mentioned, that's
> "utf8mb4" in MySQL), you won't get that byte sequence back. You'll get
> back valid UTF-8.

Actually (with python3 and mysql.connector), you'll get back str values,
not byte values encoded in utf-8 or latin-1. You don't have to decode
them because the driver already did it.

So as a Python programmer, you don't care what character set the
database uses internally, as this is almost completely hidden from you
(The one aspect that isn't hidden is of course the set of characters
that you can store in a character field: Obviously, you can't store
Chinese characters in a latin1 field).

If you are using Python2, manual encoding and decoding may be necessary.
(AFAICS the OP still hasn't stated which Python version he uses)


   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180324/50cc5a0d/attachment.sig>