[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ypeError: decoding str is not supported

On Sat, 28 Sep 2019 at 10:53, Peter Otten <__peter__ at web.de> wrote:
> Hongyi Zhao wrote:
> > Hi,
> >
> > I have some code comes from python 2 like the following:
> >
> > str('a', encoding='utf-8')
> This fails in Python 2
> >>> str("a", encoding="utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: str() takes at most 1 argument (2 given)
> ...unless you have redefined str, e. g. with
> >>> str = unicode
> >>> str("a", encoding="utf-8")
> u'a'
> > But for python 3, this will fail as follows:
> >
> >>>> str('a', encoding='utf-8')
> > Traceback (most recent call last):
> >   File "<input>", line 1, in <module>
> > TypeError: decoding str is not supported
> >
> >
> > How to fix it?
> Don' try to decode an already decoded string; use it directly:
> "a"

To explain a little further, one of the biggest differences between
Python 2 and Python 3 is that you *have* to be clear in Python 3 on
which data is encoded byte sequences (which need a decode to turn them
into text strings, but cannot be encoded, because they already are)
and which are text strings (which don't need to be, and can't be,
decoded, but which can be encoded if you want to get a byte sequence).
If you're not clear whether some data is a byte string or a text
string, you will get in a muddle, and Python 2 won't help you (but it
will sometimes produce mojibake without generating an error) whereas
Python 3 will tend to throw errors flagging the issue (but it may
sometimes be stricter than you are used to).

Thinking that saying `str = unicode` is a reasonable thing to do is a
pretty strong indication that you're not clear on whether your data is
text or bytes - either that or you're hoping to make a "quick fix".
But as you've found, quick fixes tend to result in a cascade of
further issues that *also* need quick fixes. The right solution here
(and by far the cleanest one) is to review your code as a whole, and
have a clear separation between bytes data and text data. The usual
approach people use for this is to decode bytes into text as soon as
it's read into your program, and only ever use genuine text data
within your program - so you should only ever be using encode/decode
in the I/O portion of your application, where it's pretty clear when
you have encoded bytes coming in or going out.

Hope this helps,