osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode (was: Old Man Yells At Cloud)


On 17 September 2017 at 12:38, Leam Hall <leamhall at gmail.com> wrote:
> On 09/17/2017 07:25 AM, Steve D'Aprano wrote:
>>
>> On Sun, 17 Sep 2017 08:03 pm, Leam Hall wrote:
>>
>>> I'm still trying to figure out how to convert a string to unicode in
>>> Python 2.
>>
>>
>>
>> A Python 2 string is a string of bytes, so you need to know what encoding
>> they
>> are in. Let's assume you got them from a source using UTF-8. Then you
>> would do:
>>
>> mystring.decode('utf-8')
>>
>> and it will return a Unicode string of "code points" (think: more or less
>> characters).
>
>
>
> Still trying to keep this Py2 and Py3 compatible.
>
> The Py2 error is:
>         UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6'
>         in position 8: ordinal not in range(128)
>
> even when the string is manually converted:
>         name    = unicode(self.name)
>
> Same sort of issue with:
>         name    = self.name.decode('utf-8')
>
>
> Py3 doesn't like either version.

Your string is likely not UTF-8 with a character \xf6 in it. Maybe
it's latin-1? The key here is there's no way for Python (or any
program) to know the encoding of the byte string, so you have to tell
it.

Paul