[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Old Man Yells At Cloud

On Sun, Sep 17, 2017 at 5:54 PM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> To even *know* that there are branches of maths where int/int isn't defined, you
> need to have learned aspects of mathematics that aren't even taught in most
> undergrad maths degrees. (I have a degree in maths, and if we ever covered
> areas where int/int was undefined, it was only briefly, and I've long since
> forgotten it.)

How about this:

>>> (1<<10000)/2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: integer division result too large for a float

int/int is now undefined. In Py2, it perfectly correctly returns
another integer (technically a long), but in Py3, it can't return a
float, so it errors out. This is nothing to do with the mathematical
notion of a "real", which is a superset of the mathematical notion of
an "integer"; it's all to do with the Python notion of a "float",
which is NOT a superset of the Python notion of an "integer".

In Python 2, an ASCII string could be implicitly promoted to a Unicode string:

>>> user_input = u"Real Soon Now?"
>>> print("> " + user_input + " <")
> Real Soon Now? <

In Python 2 and 3, a small integer can be implicitly promoted to float:

>>> user_input = 3.14159
>>> print(user_input + 1)

Both conversions can cause data-dependent failures when used with
arbitrary input, but are unlikely to cause problems when you're
promoting literals. Both conversions require proximity to the other
type. As long as you're explicit about the data type used for user
input, you can short-hand your literals and get away with it:

>>> # more likely, input came as text
>>> user_input = float("1.234")
>>> print(user_input + 1)
>>> # and hey, it works with other types too!
>>> user_input = decimal.Decimal("1.234")
>>> print(user_input + 1)
>>> user_input = fractions.Fraction("1.234")
>>> print(user_input + 1)

The trouble only comes when you take two pieces of user input in
different types, and try to combine them:

>>> user_1 = float("1.234")
>>> user_2 = int("9"*999) # imagine someone typed it
>>> user_1 + user_2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too large to convert to float

Solution? Always use the right data types for user input. Easy enough.

Python 3 introduces a completely different way to get failure, though.
You can be 100% consistent with your data types, but then get
data-dependent failures if, and only if, you divide. (Technically, not
completely new in Py3; you can get this in Py2 with exponentiation -
"2**-1" will yield a float. Far less likely to be hit, but could
potentially cause the same problems.) I don't know of any exploits
that involve this, but I can imagine that you could attack a Python
script by forcing it to go floating-point, then either crashing it
with a huge integer, or exploiting round-off, depending on whether the
program is assuming floats or assuming ints.

Python 3 *removed* one of these data-dependent distinctions, by making
bytes+text into an error:

>>> b"asdf" + u"qwer"

>>> b"asdf" + u"qwer"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytes

But it added a different one, by allowing a common and normal
operation to change a data type. Is it better to make things
convenient for the case of small integers (the ones that are perfectly
representable as floats), while potentially able to have problems on
larger ones? Considering how large a "small integer" can be, most
programmers won't think to test for overflow - just as many
programmers won't test non-ASCII data. Thanks to Python 3, the
"non-ASCII data" one isn't a problem, because you'll get the same
exception with ASCII data as with any other; but the "small integer"
one now is.

Data-dependent type errors don't seem like a smart thing to me.