Handle foreign character web input
On Sat, Jun 29, 2019 at 7:01 AM Tobiah <toby at tobiah.org> wrote:
> On 6/28/19 1:33 PM, Chris Angelico wrote:> On Sat, Jun 29, 2019 at 6:31 AM Tobiah <toby at tobiah.org> wrote:
> >> A guy comes in and enters his last name as R?nngren.
> >> So what did the browser really give me; is it encoded
> >> in some way, like latin-1? Does it depend on whether
> >> the name was cut and pasted from a Word doc. etc?
> >> Should I handle these internally as unicode? Right
> >> now my database tables are latin-1 and things seem
> >> to usually work, but not always.
> > Definitely handle them as Unicode. You'll receive them in some
> > encoding, probably UTF-8, and it depends on the browser. Ideally, your
> > back-end library (eg Flask) will deal with that for you.
> It varies by browser?
> So these records are coming in from all over the world. How
> do people handle possibly assorted encodings that may come in?
> I'm using Web2py. Does the request come in with an encoding
> built in? Is that how people get the proper unicode object?
Yes. Normally, the browser will say "hey, here's a request body, and
this is the encoding and formatting".
Try it - see what you get.