[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Old Man Yells At Cloud

On Sun, 17 Sep 2017 08:43 pm, Chris Angelico wrote:

> On Sun, Sep 17, 2017 at 5:54 PM, Steve D'Aprano
> <steve+python at pearwood.info> wrote:
>> To even *know* that there are branches of maths where int/int isn't defined,
>> you need to have learned aspects of mathematics that aren't even taught in
>> most undergrad maths degrees. (I have a degree in maths, and if we ever
>> covered areas where int/int was undefined, it was only briefly, and I've long
>> since forgotten it.)
> How about this:
>>>> (1<<10000)/2
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OverflowError: integer division result too large for a float
> int/int is now undefined. 

No, it's perfectly defined: you get an overflow error if the arguments are too
big to convert, or an underflow error if the denominator is too small, or a
divide by zero error if you divide by zero...

What do you make of this?

py> float(1<<10000)/2.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too large to convert to float

Would you like to argue that this shows that coercing ints to floats
is "undefined"?

Overflow and underflow errors are limitations of the float data type. We could
fix that in a couple of ways:

- silently underflow to zero (as Python already does!) or infinity, as needed;

- use a bigger ~~boat~~ float;

- or even an arbitrary precision float;

- or return a rational number (fraction or similar);

- or introduce a float context that allows you to specify the behaviour
  that you want, as the decimal module does.

There may be other solutions I haven't thought of. But these will do.

The distinction between Python floats and real numbers ? is a red-herring. It
isn't relevant.

> In Py2, it perfectly correctly returns 
> another integer (technically a long), but in Py3, it can't return a
> float, so it errors out.

Apart from your "correctly", which I disagree with, that's a reasonable
description. The problem is that your example returns the correct result by
accident. Forget such ludicrously large values, and try something more common:


Most people aren't expecting integer division, but true division, and silently
returning the wrong result (0 instead of 0.5) is a silent source of bugs.

This isn't some theoretical problem that might, maybe, perhaps, be an issue for
some people sometimes. It was a regular source of actual bugs leading to code
silently returning garbage.

> This is nothing to do with the mathematical 
> notion of a "real", 

I don't believe I ever mentioned Reals. I was pretty careful not to.

> which is a superset of the mathematical notion of 
> an "integer"; it's all to do with the Python notion of a "float",
> which is NOT a superset of the Python notion of an "integer".

So? Operations don't *have* to return values from their operands' type.

len('abc') doesn't return a string.

alist.find(1) doesn't have to return either a list or an int.

And 1/2 doesn't have to return an int. Why is this such a big deal?

> In Python 2, an ASCII string could be implicitly promoted to a Unicode string:
>>>> user_input = u"Real Soon Now?"
>>>> print("> " + user_input + " <")
>> Real Soon Now? <

And that was a bug magnet, like using / for integer division sometimes and true
division other times was a big magnet. So Python 3 got rid of both bad design

> In Python 2 and 3, a small integer can be implicitly promoted to float:
>>>> user_input = 3.14159
>>>> print(user_input + 1)
> 4.14159

Yes, as it should. Why force the user to call float() on one argument when the
interpreter can do it? What advantage is there?

Can you demonstrate any failure of dividing two ints n/m which wouldn't equally
fail if you called float(n)/float(m)? I don't believe that there is any such
failure mode. Forcing the user to manually coerce to floats doesn't add any

> Both conversions can cause data-dependent failures when used with
> arbitrary input, 

There's a difference:

- with automatic promotion of bytes to Unicode, you get errors that 
  pass silently and garbage results;

- with automatic promotion of bytes to Unicode, you get errors that 
  pass silently and garbage results;

- but with true division, if int/int cannot be performed using
  floats, you get an explicit error.

Silently returning the wrong result was a very common consequence of the int/int
behaviour in Python 2. Is there any evidence of common, real-world bugs caused
by true division?

Beginners who make assumptions that Python is C (or any other language) and
use / when they should use // don't count: that's no different from somebody
using ^ for exponentiation.

> The trouble only comes when you take two pieces of user input in
> different types, and try to combine them:
>>>> user_1 = float("1.234")
>>>> user_2 = int("9"*999) # imagine someone typed it
>>>> user_1 + user_2
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OverflowError: int too large to convert to float

I'm sorry, I fail to see why you think this is "trouble". It's just normal
Python behaviour in the face of errors: raise an exception. If you pass a bad
value, you get an exception of some kind.

Are these "trouble" too?

py> ''[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

py> int('xyz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'xyz'

Getting an explicit exception on error is the right thing to do. Silently
returning garbage is not.

If you want to argue that int/int should return infinity, or a NAN, on overflow,
that's possibly defensible. But arguing that somehow the division operator is
uniquely or specifically "trouble" because it raises an exception when given
bad data, well, that's just weird.

> Python 3 introduces a completely different way to get failure, though.
> You can be 100% consistent with your data types, but then get
> data-dependent failures if, and only if, you divide.

Its true that most operations on integers will succeed. But not all.

Try (1<<10000)**(1<<10000) if you really think that integer ops are guaranteed
to succeed. (I'm scared to try it myself, because I've had bad experiences in
the past with unreasonably large ints.)

But then, what of it? All that means is that division can fail. But even integer
division can fail:

py> 1//0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

> I don't know of any exploits
> that involve this, but I can imagine that you could attack a Python
> script by forcing it to go floating-point, then either crashing it
> with a huge integer, or exploiting round-off, depending on whether the
> program is assuming floats or assuming ints.

You're not seriously arguing that true division is a security vulnerability?

In any case, the error here is an exception, not silent failures.

    "I find it amusing when novice programmers believe their
     main job is preventing programs from crashing. ... More
     experienced programmers realize that correct code is
     great, code that crashes could use improvement, but
     incorrect code that doesn?t crash is a horrible nightmare."
     -- Chris Smith

Using / for integer division, if and only if both arguments are integers, was
exactly that horrible nightmare.

?Cheer up,? they said, ?things could be worse.? So I cheered up, and sure
enough, things got worse.