[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sorting NaNs

On Sat, 02 Jun 2018 21:02:14 +1000, Chris Angelico wrote:

> Point of curiosity: Why "> 0.5"? 

No particular reason, I just happened to hit that key and then copied and 
pasted the line into the next one.

> Normally when I want a fractional
> chance, I write the comparison the other way: "random.random() < 0.5"
> has exactly a 50% chance of occurring (presuming that random.random()
> follows its correct documented distribution). I've no idea what the
> probability of random.random() returning exactly 0.5 is

Neither do I. But I expect that "exactly 50% chance" is only 
approximately true :-)

My understanding is that given the assumption of uniformity, the 50% 
chance is mathematically true, but in that case, it makes no difference 
whether you go from 0 to 0.5 or 0.5 to 1.0. Mathematically it makes no 
difference whether you include or exclude the end points. In the Real 
numbers, there's an uncountable infinity of points either way.

*But* once you move to actual floats, that's no longer true. There are a 
great many more floats between 0 and 0.5 than between 0.5 and 1.0. 
Including the end points, if we enumerate the floats we get:

0.0 --> 0
0.5 --> 4602678819172646912
1.0 --> 4607182418800017408

so clearly the results of random.random() cannot be uniformly distributed 
over the individual floats. If they were, the probably of getting 
something less than or equal to 0.5 would be 4602678819172646912 / 
4607182418800017407 or a little more than 99.9%.

So given that our mathematically pure(ish) probability of 0.5 for the 
reals has to be mapped in some way to a finite number of floats, I 
wouldn't want to categorically say that that the probability remains 
*precisely* one half. But if it were (let's say) 1 ULP greater or less 
than one half, would we even know?

0.5 - 1 ULP = 0.49999999999999994

0.5 + 1 ULP = 0.5000000000000001

I would love to see the statistical experiment that could distinguish 
those two probabilities from exactly 1/2 to even a 90% confidence 
level :-)

> but since it
> can return 0.0 and cannot return 1.0, I've just always used less-than.
> (Also because it works nicely with other values - there's a 30% chance
> that random.random() is less than 0.3, etc.) Is there a reason for going
> greater-than, or is it simply that it doesn't matter?

No, no particular reason. If I had thought about it I would have used < 
too, but I didn't.

Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson