[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why exception from os.path.exists()?

On Thu, 07 Jun 2018 17:45:06 +1000, Chris Angelico wrote:

> On Thu, Jun 7, 2018 at 1:55 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Tue, 05 Jun 2018 23:27:16 +1000, Chris Angelico wrote:
>>> And an ASCIIZ string cannot contain a byte value of zero. The parallel
>>> is exact.
>> Why should we, as Python programmers, care one whit about ASCIIZ
>> strings? They're not relevant. You might as well say that file names
>> cannot contain the character "?" because ASCIIZ strings don't support
>> it.
>> No they don't, and yet nevertheless file names can and do contain
>> characters outside of the ASCIIZ range.
> Under Linux, a file name contains bytes, most commonly representing
> UTF-8 sequences.

The fact that user-space applications like the shell and GUI file 
managers sometimes treat file names at UTF-8 Unicode is not really 
relevant to what the file system allows. The most common Linux file 
systems are fundamentally bytes, not Unicode characters, and while I'm 
willing to agree to call the byte 0x41 "A", there simply is no such byte 
that means "?" or U+10902 PHOENICIAN LETTER GAML.

File names under typical Linux file systems are not necessarily valid 
UTF-8 Unicode. That's why Python still provides a bytes-interface as well 
as a text interface.

> So... an ASCIIZ string *can* contain that character, or
> at least a representation of it. Yet it cannot contain "\0".

You keep saying that as if it made one whit of difference to what 
os.path.exists should do. I completely agree that ASCIIZ strings cannot 
contain NUL bytes. What does that have to do with os.path.exists()?

NTFS file systems use UTF-16 encoded strings. For typical mostly-ASCII 
pathnames, the bytes on disk are *full* of NUL bytes. If the 
implementation detail that ASCIIZ strings cannot contain NUL is important 
to you, it should be equally important that UTF-16 strings typically have 
many NULs.

They're actually both equally implementation details and utterly 
irrelevant to the behaviour of os.path.exists.

Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson