[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why exception from os.path.exists()?

On 2018-06-05 07:37:34 +0000, Steven D'Aprano wrote:
> On Mon, 04 Jun 2018 22:13:47 +0200, Peter J. Holzer wrote:
> > On 2018-06-04 13:23:59 +0000, Steven D'Aprano wrote:
> >> I don't know whether or not the Linux OS is capable of accessing
> >> files with embedded NULs in the file name. But Mac OS is capable of
> >> doing so, so it should be possible. Wikipedia says:
> >> 
> >> "HFS Plus mandates support for an escape sequence to allow
> >> arbitrary Unicode. Users of older software might see the escape
> >> sequences instead of the desired characters."
> > 
> > I don't know about MacOS. In Linux there is no way to pass a
> > filename with an embedded '\0' (or a '/' which is not path
> > separator) between the kernel and user space. So if a filesystem
> > contained such a filename, the kernel would have to map it (via an
> > escape sequence or some other mechanism) to a different file name.
> > Which of course means that - from the perspective of any user space
> > process - the filename doesn't contain a '\0' or '/'.
> That's an invalid analogy. According to that analogy, Python strings
> don't contain ASCII NULs, because you have to use an escape mechanism
> to insert them:
>     string = "Is this \0 not a NULL?"
> But we know that Python strings are not NUL-terminated and can contain
> NUL. It's just another character.

I think that's a bad analogy.

The escape mechanism for string literals is mostly for convenience of
the programmer. It's there to make the program's source code more
readable (and yes, also easier to write). But at run time the  \0
character is just that: A character with the value 0.

If a disk with a file system which allows embedded NUL characters is
mounted on Linux (let's for the sake of the argument assume it is HFS+,
although I have to admit that I don't know anything about the internals
of that filesystem), then the low level filesystem code has to map that
character to something else. Even the generic filesystem code of the
kernel will never see that NUL character, let alone the user space. As
far as the OS is concerned, that file doesn't contain a NUL character.
The whole system (except for some low-level FS-dependent code) will
always only see the mapped name.

If some application (which might be an interpreter, or it might be a
graphics program, for example) decides that it knows better what the
"real" filename is and reverses that mapping, it can do so - but it
would be very confusing because it would use a different file name than
the rest of the system. The user would see one file name with ls, but
would have to use a different filename in the application. The
application would show one filename in its "save" dialog, but the OS's
file manager would show another. Not a good idea, especially as the
benefits of such a scheme would be extremely narrow (you could share an
HFS+ formatted USB disk between MacOS and Linux with filenames with
embedded NULs and that application would let you use the same filenames
as you would use on MacOS).

Now, if MacOS uses something like that, this is a different matter.
Presumably (since HFS+ is a native file system) the kernel deals with
NUL characters in a straightforward manner. It might even have a
(non-POSIX) API to expose such filenames. Even if it hasn't, presumably
the mapping back and forth is done in a very low level library used by
all (or most) of the applications, so that they all show consistently
the same filename.

But Linux isn't MacOS. On Linux there are no filenames with embedded
NULs, even if you mount an HFS+ disk and even if some application
decides to internally remap filenames in a way that they can contain NUL


   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180605/3ccc1396/attachment.sig>