osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode and Python - how often do you index strings?


On Fri, 06 Jun 2014 10:47:44 +0200, Johannes Bauer wrote:

> Hm, I was under the impression that Python already took care of removing
> the \r at a line ending. Checking that right now:
[snip example]

This is called "Universal Newlines". Technically it is a build-time 
option which applies when you build the Python interpreter from source, 
so, yes, some Pythons may not implement it at all. But I think that it 
has been on by default for a long time, and the option to turn it off may 
have been removed in Python 3.3 or 3.4. In practical terms, you should 
normally expect it to be on.


Here's the PEP that introduced it: 
http://legacy.python.org/dev/peps/pep-0278/


The idea is that when universal newlines support is enabled, by default 
will convert any of \n, \r or \r\n into \n when reading from a file in 
text mode, and convert back the other way when writing the file.

In binary mode, newlines are *never* changed.

In Python 3, you can return end-of-lines unchanged by passing newline='' 
to the open() function.

https://docs.python.org/2/library/functions.html#open
https://docs.python.org/3/library/functions.html#open




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/