[Python-Dev] Counting collisions for the win
Frank Sievertsen wrote:
Am 20.01.2012 13:08, schrieb Victor Stinner:
>>> I'm surprised we haven't seen bug reports about it from users
>>> of 64-bit Pythons long ago
>> A Python dictionary only uses the lower bits of a hash value. If your
>> dictionary has less than 2**32 items, the dictionary order is exactly
>> the same on 32 and 64 bits system: hash32(str)& mask == hash64(str)&
>> mask for mask<= 2**32-1.
> No, that's not true.
> Whenever a collision happens, other bits are mixed in very fast.
Bits are mixed in quickly from a denial-of-service standpoint, but
Victor is correct from a "Why don't the tests already fail?" standpoint.
A dict with 2**12 slots, holding over 2700 entries, will be far larger
than most test cases -- particularly those with visible output. In a
dict that size, 32-bit and 64-bit machines will still probe the same
first, second, third, fourth, fifth, and sixth slots. Even on the
rare cases when there are at least 6 collisions, the next slots may
well be either the same, or close enough that it doesn't show up in a
changed iteration order.
If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them. -jJ