[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Python-Dev] Counting collisions for the win

In http://mail.python.org/pipermail/python-dev/2012-January/115715.html
Frank Sievertsen wrote:

Am 20.01.2012 13:08, schrieb Victor Stinner:
>>> I'm surprised we haven't seen bug reports about it from users
>>> of 64-bit Pythons long ago
>> A Python dictionary only uses the lower bits of a hash value. If your
>> dictionary has less than 2**32 items, the dictionary order is exactly
>> the same on 32 and 64 bits system: hash32(str)&  mask == hash64(str)&
>> mask for mask<= 2**32-1.

> No, that's not true.
> Whenever a collision happens, other bits are mixed in very fast.

> Frank

Bits are mixed in quickly from a denial-of-service standpoint, but
Victor is correct from a "Why don't the tests already fail?" standpoint.

A dict with 2**12 slots, holding over 2700 entries, will be far larger
than most test cases -- particularly those with visible output.  In a
dict that size, 32-bit and 64-bit machines will still probe the same
first, second, third, fourth, fifth, and sixth slots.  Even on the
rare cases when there are at least 6 collisions, the next slots may
well be either the same, or close enough that it doesn't show up in a
changed iteration order.



If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ