logo       
Google Custom Search
    AddThis Social Bookmark Button
-->

Re: UTF-8 encoding for plain text: msg#00160

Subject: Re: UTF-8 encoding for plain text
Stefan Behnel wrote:

   if isinstance(value, unicode):
      value = value.encode('utf-8')
   # pass value into libxml2

Would you really want to pass the above string into libxml2, then see that it
didn't work and finally try to make up a good error message to throw back an
exception? It's much easier now, where Python generates that exception for you.

Hmm.. so you're saying that the string -> unicode -> string round trip serves for detecting non-ascii characters?

Maybe you're right in that we should not accept non-ascii regular strings, as that is just too easy to get wrong. Your example above would work in my proposal, but only if the encoding of your Python source is set to UTF-8.

A check for any bytes >= 128 would be more efficient than to do a unicode round trip.

Regards,
Geert


<Prev in Thread] Current Thread [Next in Thread>