# Sorting NaNs

```On Thu, 07 Jun 2018 20:43:10 +0000, Peter Pearson wrote:

> On Thu, 07 Jun 2018 19:02:42 +1200, Gregory Ewing wrote:
>> Steven D'Aprano wrote:
>>> But if it were (let's say) 1 ULP greater or less than one half, would
>>> we even know?
>>
>> In practice it's probably somewhat bigger than 1 ULP. A typical PRNG
>> will first generate a 32-bit integer and then map it to a float, giving
>> a resolution coarser than the 52 bits of an IEEE double.
>>
>> But even then, the probability of getting exactly 0.5 is only 1/2^32,
>> which you're not likely to notice.
>
> But gosh, if there are only 2**32 different "random" floats, then you'd
> have about a 50% chance of finding a collision among any set of 2**16
> samples.  Is that really tolerable?

Why wouldn't it be? It would be shocking if a sufficiently large sequence
of numbers contained no collisions at all: that would imply the values
were very much NON random.

If you truly were limited to 2**32 different values (we're not), then it
would be exactly right and proper to expect a collision in 2**16 samples.
Actually, a lot less than that: more like 78000.

https://en.wikipedia.org/wiki/Birthday_problem

(For 2**60 values, we'd need about 1.2 billion samples.)

Greg's statement about "a typical PRNG" may or may not be true, depending
on what counts as "typical". The standard C language PRNG is notoriously
awful, with a tiny period and lots and lots of correlations between
values. Its barely random-ish.

But like many other languages, Python uses a more modern random number
generator, the Mersenne Twister, which passes a battery of statistical
tests for randomness (including DieHard) and has a very long period of
2**19937 - 1. I understand that Python's Mersenne Twister implementation
is based on 64-bit ints.

https://en.wikipedia.org/wiki/Mersenne_Twister

--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

```