[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Python 3.2 has some deadly infection

On Mon, 02 Jun 2014 12:10:48 +0100, Robin Becker wrote:

> there seems to be an implicit assumption in python land that encoded
> strings are the norm. On virtually every computer I encounter that
> assumption is wrong. The vast majority of bytes in most computers is not
> something that can be easily printed out for humans to read. I suppose
> some clever pythonista can figure out an encoding to read my .o / .so
> etc  files, but they are practically meaningless to a unicode program
> today. Same goes for most image formats and media files. Browsers
> routinely encounter mis/un-encoded pages.

If you include image, video and sound files, you are probably correct 
that most content of files is binary.

Outside of those three kinds of files, I would expect that *by far* the 
single largest kind of file is text. Some text is wrapped in a binary 
layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human 
readable text, including web pages (html) and XML.

Every programming language I know of defaults to opening files in text 
mode rather than binary mode. There may be exceptions, but reading and 
writing text is ubiquitous while writing .o and .so files is not.

> In python I would have preferred for bytes to remain the default io
> mechanism, at least that would allow me to decide if I need any
> decoding.

That implies that you're opening files in binary mode by default. It also 
implies that even something as trivial as writing the string "Hello 
World" to a file (stdout is a file) is impossible until you've learned 
about encodings and know which encoding you need. I really don't think 
that's a good plan, for any language, but especially a language like 
Python which is intended for beginners as well as experts.

The Python 2 approach, where stdout in binary but tries really hard to 
pretend to be a superset of ASCII, is simply broken. It works well for 
trivial examples, while breaking in surprising and hard-to-diagnose ways 
in others. It violates the Zen, errors should not be ignored unless 
explicitly silenced, instead silently failing and giving moji-bake:

[steve at ando ~]$ python2.7 -c "import sys; sys.stdout.write(u'???\n')"

Changing to print doesn't help:

[steve at ando ~]$ python2.7 -c "print u'???'"

Python 3 works correctly, whether you use print or sys.stdout:

[steve at ando ~]$ python3.3 -c "import sys; sys.stdout.write(u'???\n')"

(although I haven't tested it on Windows).

Steven D'Aprano