osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Python 3.2 has some deadly infection


Le vendredi 6 juin 2014 17:25:47 UTC+2, Chris Angelico a ?crit?:
> On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> 
> > On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
> 
> >>
> 
> >>
> 
> >> How text is represented is very different from whether text is a
> 
> >> fundamental data type. A fundamental text file is such that ordinary
> 
> >> operating system facilities can't see inside the black box (that is,
> 
> >> they are *not* encoded as far as the applications go).
> 
> >
> 
> > Of course they are.  It may be an ASCII-encoding of some flavor or other, or
> 
> > something really (to me) strange -- but an encoding is most assuredly in
> 
> > affect.
> 
> 
> 
> Allow me to explain what I think Marko's getting at here.
> 
> 
> 
> In most file systems, a file exists on the disk as a set of sectors of
> 
> data, plus some metadata including the file's actual size. When you
> 
> ask the OS to read you that file, it goes to the disk, reads those
> 
> sectors, truncates the data to the real size, and gives you those
> 
> bytes.
> 
> 
> 
> It's possible to mount a file as a directory, in which case the
> 
> physical representation is very different, but the file still appears
> 
> the same. In that case, the OS goes reading some part of the file,
> 
> maybe decompresses it, and gives it to you. Same difference. These
> 
> files still contain bytes.
> 
> 
> 
> A "fundamental text file" would be one where, instead of reading and
> 
> writing bytes, you read and write Unicode text. Since the hard disk
> 
> still works with sectors and bytes, it'll still be stored as such, but
> 
> that's an implementation detail; and you could format your disk UTF-8
> 
> or UTF-16 or FSR or anything you like, and the only difference you'd
> 
> see is performance.
> 
> 
> 
> This could certainly be done, in theory. I don't know how well it'd
> 
> fit with any of the popular OSes of today, but it could be done. And
> 
> these files would not have an encoding; their on-platter
> 
> representations would, but that's purely implementation - the text
> 
> that you wrote out and the text that you read in are the same text,
> 
> and there's been no encoding visible.
> 
> 
----------

>From the three, you can already eliminates one.
It's not a good new.

sys.getsizeof('G?del'.encode('utf-8'))
23
sys.getsizeof('G?del'.encode('utf-16-le'))
27
sys.getsizeof('G?del')
42
os.listdir(r'D:\jm\??????\Z?rich\?????\?dipe')
['a.txt', 'kk.bat', 'kk.cmd', 'kk.py', '__pycache__']
sys.getsizeof(r'D:\jm\??????\Z?rich\?????\?dipe'.encode('utf-8'))
61
sys.getsizeof(r'D:\jm\??????\Z?rich\?????\?dipe'.encode('utf-16-le'))
79
sys.getsizeof(r'D:\jm\??????\Z?rich\?????\?dipe')
100

jmf