[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sorting NaNs

On Thu, 07 Jun 2018 20:43:10 +0000, Peter Pearson wrote:

> On Thu, 07 Jun 2018 19:02:42 +1200, Gregory Ewing wrote:
>> Steven D'Aprano wrote:
>>> But if it were (let's say) 1 ULP greater or less than one half, would
>>> we even know?
>> In practice it's probably somewhat bigger than 1 ULP. A typical PRNG
>> will first generate a 32-bit integer and then map it to a float, giving
>> a resolution coarser than the 52 bits of an IEEE double.
>> But even then, the probability of getting exactly 0.5 is only 1/2^32,
>> which you're not likely to notice.
> But gosh, if there are only 2**32 different "random" floats, then you'd
> have about a 50% chance of finding a collision among any set of 2**16
> samples.  Is that really tolerable?

Why wouldn't it be? It would be shocking if a sufficiently large sequence 
of numbers contained no collisions at all: that would imply the values 
were very much NON random.

If you truly were limited to 2**32 different values (we're not), then it 
would be exactly right and proper to expect a collision in 2**16 samples. 
Actually, a lot less than that: more like 78000.


(For 2**60 values, we'd need about 1.2 billion samples.)

Greg's statement about "a typical PRNG" may or may not be true, depending 
on what counts as "typical". The standard C language PRNG is notoriously 
awful, with a tiny period and lots and lots of correlations between 
values. Its barely random-ish.

But like many other languages, Python uses a more modern random number 
generator, the Mersenne Twister, which passes a battery of statistical 
tests for randomness (including DieHard) and has a very long period of 
2**19937 - 1. I understand that Python's Mersenne Twister implementation 
is based on 64-bit ints.


Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson