OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Change in Python 3's "round" behavior


On Thu, Sep 27, 2018 at 05:55:07PM +1200, Greg Ewing wrote:
> jab at math.brown.edu wrote:
> >I understand from
> >https://github.com/cosmologicon/pywat/pull/40#discussion_r219962259
> >that "to always round up... can theoretically skew the data"
> 
> *Very* theoretically. If the number is even a whisker bigger than
> 2.5 it's going to get rounded up regardless:
> 
> >>> round(2.500000000000001)
> 3
> 
> That difference is on the order of the error you expect from
> representing decimal fractions in binary, so I would be surprised
> if anyone can actually measure this bias in a real application.

I think you may have misunderstood the nature of the bias. It's not 
about individual roundings and it definitely has nothing to do with 
binary representation.

Any one round operation will introduce a bias. You had a number, say 
2.3, and it gets rounded down to 2.0, introducing an error of -0.3. But 
if you have lots of rounds, some will round up, and some will round 
down, and we want the rounding errors to cancel.

The errors *almost* cancel using the naive rounding algorithm as most of 
the digits pair up:

.1 rounds down, error = -0.1
.9 rounds up, error = +0.1

.2 rounds down, error = -0.2
.8 rounds up, error = +0.2

etc. If each digit is equally likely, then on average they'll cancel and 
we're left with *almost* no overall error.

The problem is that while there are four digits rounding down (.1 
through .4) there are FIVE which round up (.5 through .9). Two digits 
don't pair up:

.0 stays unchanged, error = 0
.5 always rounds up, error = +0.5

Given that for many purposes, our data is recorded only to a fixed 
number of decimal places, we're dealing with numbers like 0.5 rather 
than 0.5000000001, so this can become a real issue. Every ten rounding 
operations will introduce an average error of +0.05 instead of 
cancelling out. Rounding introduces a small but real bias.

The most common (and, in many experts' opinion, the best default 
behaviour) is Banker's Rounding, or round-to-even. All the other digits 
round as per the usual rule, but .5 rounds UP half the time and DOWN the 
rest of the time:

0.5, 2.5, 3.5 etc round down, error = -0.5
1.5, 3.5, 5.5 etc round up, error = +0.5

thus on average the .5 digit introduces no error and the bias goes away.



-- 
Steve