[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why exception from os.path.exists()?

On 4 June 2018 at 13:01, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:

>> Turns out that this is a limitation on Windows as well. The \0 is not
>> allowed for Windows, macOS and Posix.
> We -- all of us, including myself -- have been terribly careless all
> through this discussion. The fact is, this should not be an OS limitation
> at all. It is a *file system* limitation.
> If I can mount a HFS or HFS-Plus disk on Linux, it can include file names
> with embedded NULs or slashes. (Only the : character is illegal in HFS
> file names.) It shouldn't matter what the OS is, if I have drivers for
> HFS and can mount a HFS disk, I ought to be able to sensibly ask for file
> names including NUL.

Agreed, being completely precise in this situation is both pretty
complicated, and essential.

The question of what are legal characters in a filename is, as you
say, a filesystem related issue. People traditionally forget this
point, but in these days of cross-platform filesystem mounting,
networked filesystems[1], etc, it's more and more relevant, and
thankfully people are getting more aware of the point.

But there's also the question of what capability the kernel API has to
express the queries. The fact that the Unix API (and the Windows one,
in most cases - although as Eryk Sun pointed out there are exceptions
in the Windows kernel API) uses NUL-terminated strings means that
querying the filesystem about filenames with embedded \0 characters
isn't possible *at the OS level*. (As another example, the fact that
the Unix kernel treats filenames as byte strings means that there are
translation issues querying an NTFS filesystem that uses Unicode
(UTF-16) natively - and vice versa when Windows queries a Unix-native

So "it's complicated" is about the best we can say :-)


[1] And of course if you mount (say) an NTFS filesystem over NFS, you
have *two* filesystems involved, each adding its own layer of
restrictions and capabilities.