Creating Hashable and Unhashable Objects in Python

Classes with __hash__.

Table of Contents

Default Hashability


All Python objects are instances of some class. Even classes themselves are instances of the class type. Class instances in Python are hashable by default. They inherit it from object, the base class of all classes in Python 3.

Both objects, the class itself and the class instance, are hashable.

class A:
    pass

print(A.__class__)
# <class 'type'>

print(A.mro())
# [<class '__main__.A'>, <class 'object'>]

a = A()

print(hash(A))
# 97365552112
print(hash(a))
# 97365511141

Built-in functions and types are hashable (except for mutable containers such as lists, dicts, sets). A tuple is hashable if all of its elements are hashable.

# <class 'builtin_function_or_method'>
print(hash(sorted))
# 8631570007220

# <class 'Exception'>
print(hash(Exception("Error")))
# 124008387560

# <class 'str'>
print(hash("Python"))
# 6279743517120683530

# <class 'int'>
print(hash(123456))
# 123456

t = (1, 2, 3)
print(hash(t))
# 529344067295497451

t = (1, 2, [])
print(hash(t))
# TypeError: unhashable type: 'list'

Generators, iterators, file objects, etc.

# <class 'generator'>
g = (x for x in "abc")
print(hash(g))
# 138921020124

# <class 'list_iterator'>
i = iter([1, 2, 3, 4, 5])
print(hash(i))
# 138921358108

# <class '_io.TextIOWrapper'>
f = open("test.py")
print(hash(f))
# 138921319366

Modules, user-defined functions are also hashable.

import random

# random <class 'module'>
print(hash(random))
# 164856203943


def foo():
    pass

# foo <class 'function'>
print(hash(foo))
# 164856560844

Overriding the __hash__


Classes can define their hash function by overriding the __hash__ method. If the hash value is calculated from self._value, that value must not be a mutable object.

class A:
    def __init__(self, value):
        self._value = value

    def __hash__(self):
        return hash(self._value)
        # or return my_hash_algorithm(self._value)

It is recommended that user classes define both __hash__() and __eq__() methods.

# hash(a) == hash(a1) and a == a1

class A:
     
    def __init__(self, value: str):
        self._value = value
         
    def __hash__(self):
        return hash(self._value)
     
    def __eq__(self, other):
        return self._value == other._value

a = A("qwerty")
a1 = A("qwerty")

print(hash(a))
# -5570161944640390331
print(hash(a1))
# -5570161944640390331

print(a == a1)
# True
print(a is a1)
# False

These objects are treated as equals in sets or as dictionary keys.

s = {a, a1}
print(s)
# {<__main__.A object at 0x000001B22F1DBE50>}
print(a in s, a1 in s)
# True True

for i in s:
    print(i is a)
    # True
    print(i is a1)
    # False
    
d = {a: "a", a1: "a1"}
print(d)
# {<__main__.A object at 0x000001B22F1DBE50>: 'a1'}
print(d[a])
# a1
print(d[a1])
# a1

print(a in d.keys())
# True
print(a1 in d.keys())
# True

for k in d.keys():
    print(k is a)
    # True
    print(k is a1)
    # False

Instances of different classes can also be made equal.

The __hash__ and __eq__ of decimal.Decimal and fractions.Fraction are overridden. This was made to match the numerically equal integer or float.

from decimal import Decimal
from fractions import Fraction

d = Decimal(3.14)
f = Fraction(3.14)

print(hash(d))
# 322818021289917443
print(hash(f))
# 322818021289917443
print(hash(3.14))
# 322818021289917443

print(d == f, f == 3.14, d == 3.14)
# True True True

We can keep the implementation of __hash__() from the parent class. If we define only the __eq__ method, then objects of this class will be unhashable.

# hash(a) != hash(a1) and a == a1

class A:

    # <ParentClass>.__hash__
    __hash__ = object.__hash__
     
    def __init__(self, value: str):
        self._value = value
     
    def __eq__(self, other):
        return self._value == other._value

a = A("qwerty")
a1 = A("qwerty")

print(hash(a))
# 132341991081
print(hash(a1))
# 132341640165

print(a == a1)
# True
print(a is a1)
# False

If the hashes are different, both objects are in a set or dict.

s = {a, a1}
print(s)
# {<__main__.A object at 0x0000028BBFC0E190>,
# <__main__.A object at 0x0000028BC0196350>}
print(a in s, a1 in s)
# True True

d = {a: "a", a1: "a1"}
print(d)
# {<__main__.A object at 0x0000028BBFC0E190>: 'a',
# <__main__.A object at 0x0000028BC0196350>: 'a1'}
print(d[a])
# a
print(d[a1])
# a1

Unhashable Objects


If we want to suppress hash support, we can include __hash__ = None in the class definition.

The class A itself still has a hash value because the class is an instance of a built-in class. We cannot modify a built-in class. The __hash__ in class A definition only applies to instances of class A.

class A:
    __hash__ = None

a = A()

print(hash(A))
# 71517693211

print(hash(a))
# TypeError: unhashable type: 'A'

The same effect can be achieved if we inherit our class from some unhashable Python type.

class A(list):
    pass

a = A()

print(hash(A))
# 119112257331

print(hash(a))
# TypeError: unhashable type: 'A'

References:

  1. Docs: object.__hash__
  2. GitHub: _pydecimal.py
  3. GitHub: fractions.py

Popular posts from this blog