osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why exception from os.path.exists()?


On Tue, 05 Jun 2018 20:15:01 +1000, Chris Angelico wrote:

> On Tue, Jun 5, 2018 at 5:37 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:
>>
>>> On 2018-06-04 13:23:59 +0000, Steven D'Aprano wrote:
>> [...]
>>
>>>> I don't know whether or not the Linux OS is capable of accessing
>>>> files with embedded NULs in the file name. But Mac OS is capable of
>>>> doing so, so it should be possible. Wikipedia says:
>>>>
>>>> "HFS Plus mandates support for an escape sequence to allow arbitrary
>>>> Unicode. Users of older software might see the escape sequences
>>>> instead of the desired characters."
>>>
>>> I don't know about MacOS. In Linux there is no way to pass a filename
>>> with an embedded '\0' (or a '/' which is not path separator) between
>>> the kernel and user space. So if a filesystem contained such a
>>> filename, the kernel would have to map it (via an escape sequence or
>>> some other mechanism) to a different file name. Which of course means
>>> that - from the perspective of any user space process - the filename
>>> doesn't contain a '\0' or '/'.
>>
>> That's an invalid analogy. According to that analogy, Python strings
>> don't contain ASCII NULs, because you have to use an escape mechanism
>> to insert them:
>>
>>     string = "Is this \0 not a NULL?"
>>
>>
>> But we know that Python strings are not NUL-terminated and can contain
>> NUL. It's just another character.
>>
>>
> No; by that analogy, a Python string cannot contain a non-Unicode
> character. Here's a challenge: create a Python string that contains a
> character that isn't part of the Universal Character Set.

Huh? In what way is that the analogy being made? Your challenge is 
impossible from pure Python, equivalent to "create a Python bytes object 
that contains a byte greater than 255". The challenge is rigged to be 
doomed to fail.

That's not the case when it comes to \0 in file names: we know that Mac 
OS can do it, we know HFS and Apple FS support NUL in file names. We have 
an existence proof that it is possible.

(Although in your case, it is conceivable that using C you might be able 
to solve the challenge: create a string using the UCS-4 implementation 
(32-bit code units), then modify some code unit to be a value outside of 
the 21-bit range supported by Unicode. But that would require low-level 
hacking, it isn't supported by the language or the interpreter except 
maybe via ctypes.)

Apple FS, HFS and HFS Plus support \0 as a valid Unicode character. The 
Mac OS kernel has an escape mechanism to allow user code to include \0 
characters in pathnames, just as Python has an escape mechanism to allow 
user code to include \0 in strings.

There's no such escape mechanism for characters outside of Unicode.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson