Python Hash Collision

Here's a fun little experiment that I found interesting a while ago. Python has a very simple hash collision with integers that's implemented by design.

hash(-1) == -2 == hash(-2)

Normally, in Python you can expect hash(x) == x for integers. However, in this case hash(-1) = -2 for a reason outside of Python's control. Python is written in C, a programming language that predates error handling mechanisms found in modern languages. Instead, C uses sentinel return values, i.e. using integers to signal errors. As a result, the implementation for the hash function treats -1 as an error internally, so the resulting hash value cannot be -1.

If we open the source code for long types (Python integers) and take a look at the long_hash function, we find this final validation check:

if (x == (Py_uhash_t)-1)
    x = (Py_uhash_t)-2;
return (Py_hash_t)x;

To be consistent with C idioms, long_hash will never return -1 since the caller might mistakenly treat it as an error instead of a valid hash value.

So while Python looks and feels like a modern language, it still has underlying quirks like this simple hash collision due to the history of its implementation language.