[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Python 3.2 has some deadly infection

Gregory Ewing <greg.ewing at canterbury.ac.nz>:

> As a result, most unix programs, most of the time, deal
> with text on stdin and stdout.

Well, ok. But even accepting that premise, that "text" might not be what
Python3 considers "text".

For example, if your program reads in XML, JSON or Python, the parser
object might prefer to take it in as bytes and not have it predecoded by

> So, it makes sense for them to be text by default.

I'm not sure. That could lead to nasty surprises.

I've experienced analogous consternations when the "sort" utility hasn't
worked identically for identical input: it is heavily influenced by the
(spit, spit) locale. That's why 99.9% of your scripts should prefix
"sort" and "grep" with LC_ALL=C -- even when the input really is UTF-8.

Should I now take it further and prefix all Python programs with
LC_ALL=C? Probably not, since UTF-8 might cause sys.stdin to barf.

> And wherever there's text, there needs to be an encoding.

No problem there, only should sys.stdin and sys.stdout carry the
decoding/encoding out or should it be left for the program.