[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Python 3.2 has some deadly infection

On Wed, Jun 4, 2014 at 2:34 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> Outside of those three kinds of files, I would expect that *by far* the
> single largest kind of file is text. Some text is wrapped in a binary
> layer, e.g. .doc, .odt, etc. but an awful lot of it is good old human
> readable text, including web pages (html) and XML.

In terms of file I/O in Python, text wrapped in a binary layer has to
be treated as binary, not text. There's no difference between a JPEG
file that has some textual EXIF information and an ODT file that's a
whole lot of zipped up text; both of them have to be read as binary,
then unpacked according to the container's specs, and then the text
portion decoded according to an encoding like UTF-8.

But you're quite right that a large proportion of files out there
really are text.