Here's a fun little experiment that I found interesting a while ago. Python has a very simple hash collision with integers that's implemented by design.
hash(-1) == -2 == hash(-2)
Normally, in Python you can expect hash(x) == x
for integers. However, in this case hash(-1) = -2
for a reason outside of Python's control. Python is written in C, a programming language that predates error handling mechanisms found in modern languages. Instead, C uses sentinel return values, i.e. using integers to signal errors. As a result, the implementation for the hash function treats -1 as an error internally, so the resulting hash value cannot be -1.
If we open the source code for long types (Python integers) and take a look at the long_hash
function, we find this final validation check:
if (x == (Py_uhash_t)-1)
x = (Py_uhash_t)-2;
return (Py_hash_t)x;
To be consistent with C idioms, long_hash
will never return -1 since the caller might mistakenly treat it as an error instead of a valid hash value.
So while Python looks and feels like a modern language, it still has underlying quirks like this simple hash collision due to the history of its implementation language.