osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Convert a list with wrong encoding to utf8


vergos.nikolas at gmail.com writes:

> ?? ??????, 14 ??????????? 2019 - 8:56:31 ?.?. UTC+2, ? ??????? MRAB ??????:
>
>> It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode 
>> string that was decoded wrongly from the bytes.
>
> Yes it doesnt have the 'b' prefix so that hexadecimal are representation of strings and not representation of bytes.
>
> I just tried:
>
> names = tuple( [s.encode('latin1').decode('utf8') for s in names] )
>
> but i get
> UnicodeEncodeError('latin-1', '???? ???????', 0, 4, 'ordinal not in range(256)')
>
> '???? ???????' is a valid name but even so it gives an error.
>
> Is it possible that Python3 a Unicode had the string wrongly decoded from the bytes ?
>
> What can i do to get the names?!

python3

>>> x = '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>>> b = bytes(ord(c) for c in x)
>>> b.decode('utf-8')
'???? ???????'
>>> 
-- 
Piet van Oostrum <piet-l at vanoostrum.org>
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]