[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why exception from os.path.exists()?

On Fri, Jun 8, 2018 at 3:10 AM, MRAB <python at mrabarnett.plus.com> wrote:
> On 2018-06-07 08:45, Chris Angelico wrote:
>> Under Linux, a file name contains bytes, most commonly representing
>> UTF-8 sequences. So... an ASCIIZ string *can* contain that character,
>> or at least a representation of it. Yet it cannot contain "\0".
> I've seen a variation of UTF-8 that encodes U+0000 as 2 bytes so that a zero
> byte can be used as a terminator.
> It's therefore not impossible to have a version of Linux that allowed a
> (Unicode) "\0" in a filename.

Considering that Linux treats filenames as raw bytes, that's not
surprising. The mangled encoding you refer to is a horrendous cheat,
though, and violates several of the design principles of UTF-8, so I
do not recommend it EVER. The correct way for Python to handle and
represent such a file name would be to use the U+DCxx range to carry
the bytes through unchanged - not using "\0".