osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why exception from os.path.exists()?


On 6/7/18 9:17 PM, Steven D'Aprano wrote:
> On Thu, 07 Jun 2018 15:38:39 -0400, Dennis Lee Bieber wrote:
>
>> On Fri, 1 Jun 2018 23:16:32 +0000 (UTC), Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> declaimed the following:
>>
>>> It should either return False, or raise TypeError. Of the two, since
>>> 3.14159 cannot represent a file on any known OS, TypeError would be more
>>> appropriate.
>>>
>> 	I wouldn't be so sure of that...
> I would.
>
> There is no existing file system which uses floats instead of byte- or 
> character-strings for file names. If you believe different, please name 
> the file
>
>
>> Xerox CP/V allowed for embedding
>> non-printable characters into file names
> Just like most modern file systems.
>
> Even FAT-16 supports a range of non-ASCII bytes with the high-bit set 
> (although not the control codes with the high-bit cleared). Unix file 
> systems typically support any byte except \0 and /. Most modern file 
> systems outside of Unix support any Unicode character (or almost any) 
> including ASCII control characters.
>
> https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits
>
>
>
This does bring up an interesting point. Since the Unix file system
really has file names that are collection of bytes instead of really
being strings, and the Python API to it want to treat them as strings,
then we have an issue that we are going to be stuck with problems with
filenames. If we assume they are utf-8 encoded, then there exist
filenames that will trap with invalid encodings? (if for example the
name were generated on a system that was using Latin-1 as an 8 bit
character set for file names). On the other hand, if we treat the file
names as 8 bit characters by themselves, if the system was using utf-8
then we are mangling any characters outside the basic ASCII set.
Basically we hit to old problem of confusing bytes and strings.
Ultimately we have a fundamental limitation with trying to abstract out
the format of filenames in the API, and we need a back door to allow us
to define what encoding to use for filenames (and be able to detect that
it doesn't work for a given file, and change it on the fly to try
again), or we need an alternate API that lets us pass raw bytes as file
names and the program needs to know how to handle the raw filename for
that particular file system.

-- 
Richard Damon