OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Valid encodings for a Python source file


On 6/7/18 4:40 PM, Daniel Glus wrote:
> I'm trying to figure out the entire list of possible encodings for a Python
> source file - that is, encodings that can go in a PEP 263
> <https://www.python.org/dev/peps/pep-0263/> encoding specification, like #
> -*- encoding: foo -*-.
>
> Is this list the same as the list given in the documentation for the codecs
> library, under "Standard Encodings"
> <https://docs.python.org/3.6/library/codecs.html#standard-encodings>? If
> not, where can I find the actual list?
>
> (I know that list is the same as the set of unique values in CPython's
> /Lib/encodings/aliases.py
> <https://github.com/python/cpython/blob/master/Lib/encodings/aliases.py>,
> or equivalently, the set of filenames in /Lib/encodings/
> <https://github.com/python/cpython/blob/master/Lib/encodings/>, but again
> I'm not sure.)
> -Daniel

Reading the proposal, I see one thing that seems worthy of a comment,
the proposal specifically calls out the UTF-8 'BOM" sequence, (which the
Unicode standard actually doesn't recommend using, as UTF-8 doesn't have
a 'Byte Order Problem', but doesn't allow the UTF-16 (0xFF, 0xFE or
0xFE, 0xFF) or UCS-4 BOM (0x00, 0x00, 0xFE, 0xFF or 0xFF, 0xFE, 0x00,
0x00)? marks which while the formats are unlikely are very likely to
have the marks, and detecting the marks are very important to detect
those encoding as they are NOT 'ACSII Compatible' formats, so the rest
of the header doesn't match what would be expected.

-- 
Richard Damon