Supporting Sequence Protocol and Magic Methods in Python

Overview of Protocols in Python and implementing the Sequence Protocol with a use case.

Supporting Sequence Protocol and Magic Methods in Python
Supporting Different Sequence Protocols and Magic Methods in Python 

Examining the book Fluent Python, we could discern the emphasis on Python’s Data Model and the Special Methods it offers us. The author returning to the subject in multiple chapters only serves to emphasize the importance of encompassing these methods into our code which will make it succinct, understandable and more concise.

Today we will go over yet another chapter dealing with the same subject only limiting us to Sequences. We will start with a base class, Vector, and implement the methods needed for our Vector to behave like an immutable flat sequence.

Use Case: Vector Class

As mentioned before we will start with a simple Vector Class, it should serve as a possible collection or sequence since a vector can be N-dimensional, i.e. a Vector can possess N elements, and it is useful in practical scenarios as one could require access to the size of the vector, slice a vector, and more.

The base class we will be starting with is this, it is identical to the one in the original book.

from array import array
import reprlib
import math

class Vector:
typecode = 'd'
    def __init__(self, components):
        self._components = array(self.typecode, components)
    
    def __iter__(self):
        return iter(self._components)
    
    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find('['):-1]
        return f'Vector({components})'

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return (bytes([ord(self.typecode)]) + bytes(self._components))

    def __eq__(self, other):
        return tuple(self) == tuple(other)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

The first thing to notice is that the self._components is the protected(by convention) attribute holding an array with the Vector components. Also as you can see, there are different magic methods already implemented including:

  • The str and the repr to get the Vector representation.
  • The iter to allow iteration over the protected vector components _components.
  • the bytes and frombytes methods to read from bytes and build byte objects from our vectors.

The second thing to point out is that we are not inheriting from any class, and we will continue not relying on inheritance to create a fully functional sequence type, we just need to implement the magic methods that fulfil the Sequence Protocol.

What is a Protocol in OOP?

A protocol is an informal interface defined only in documentation and not in the code. The Python protocol of sequence, for instance, just consists of the __len__ and __getitem__ methods. Any class that implements these methods with the established signatures can be used in any place that requires a sequence, i.e. can be used as sequences.

For example, the FrenchDeck class, implemented below, takes advantage of many Python capabilities because it implements the sequence protocol, even if that is not declared anywhere in the code, it will be treated as a Sequence by Python.

import collections

Card = collections.namedtuple('Card', ['rank', 'suit'])

class FrenchDeck:
    ranks = [str(n) for n in range(2, 11)] + list('JQKA')
    suits = 'spades diamonds clubs hearts'.split()

    def __init__(self):
        self._cards = [Card(rank, suit) for suit in self.suits for rank in self.ranks]

    def __len__(self):
        return len(self._cards)

    def __getitem__(self, position):
        return self._cards[position]

Since protocols are not enforced, we can often get away with partially following the protocol, given that we are aware of the environment in which the class will be used.

How to Make our Sequence Sliceable?

As we saw in the previous example, supporting the sequence protocol is really
easy if we can delegate to a sequence attribute in our object, in our case it is the self._components.

We will start by adding the __len__ and the __getitem__ Magic Methods.

class Vector:
# many lines omitted
# ...
	
    def __len__(self):
    	return len(self._components)
	
    def __getitem__(self, index):
    	return self._components[index]

When we execute our code and try to slice our vector we get this:

>>> v1 = Vector([3, 4, 5])
>>> len(v1)
3
>>> v1[0], v1[-1]
(3.0, 5.0)
>>> v7 = Vector(range(7))
>>> v7[1:4]
array('d', [1.0, 2.0, 3.0])

It works but it is ugly, we are printing "array('d', [1.0, 2.0, 3.0])" as a slice, this makes reference to the self._components which is an array. We need to fix our __getitem__ to return an instance of  Vector so we could get an output like this  "Vector([1.0, 2.0, 3.0])" when we print the slice.

class Vector:
# many lines omitted
# ...

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

What we are doing here is we check:

  • If the key argument is a slice we get the class of the instance (i.e. Vector) and invoke the class to build another Vector instance from a slice of the _components array.
  • If we can get an index from key, it means it is not a slice and we return the specific item from self._components.

The last thing to note is this remark from the author of Fluent Python:

Excessive use of isinstance may be a sign of bad OO design, but handling slices in __getitem__ is a justified use case.

Now when we execute our code we get this:

>>> v7 = Vector(range(7))
>>> v7[-1]
6.0
>>> v7[1:4]
Vector([1.0, 2.0, 3.0])
>>> v7[-1:]
Vector([6.0])

Here's the full class so far:

from array import array
import reprlib
import math


class Vector:
    typecode = "d"

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find("[") : -1]
        return f"Vector({components})"

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __eq__(self, other):
        return tuple(self) == tuple(other)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

How to implement Dynamic attribute access in Sequences?

Imagine having a 3 dimensional vector, it would be easier to access it with v.x, v.y and v.z rather than accessing it by index. This is what we are implementing next, how to access our attributes by name.

However, now we are dealing with vectors that may have a large number of components, and it may be convenient to access the first few components
with shortcut letters such as x, y, z instead of v[0], v[1], and v[2].

To do this we need to implement the __getattr__ magic method.

The __getattr__ method is invoked by the interpreter when attribute lookup fails.
In simple terms, given the expression my_obj.x, Python checks if the my_obj instance has an attribute named x. If not, the search goes to the class (my_obj.__class__), and then up the inheritance graph. If the x attribute is not found, then the __getattr__ method defined in the class of my_obj is called with self and the name of the attribute as a string (e.g., 'x').

In short, this is the code we have to add to support accessing our 4 first attributes by name:

class Vector:
    # many lines omitted
    # ...
    __match_args__ = ("x", "y", "z", "t")

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
            if 0 <= pos < len(self._components):
                return self._components[pos]
            msg = f"{cls.__name__!r} object has no attribute {name!r}"
            raise AttributeError(msg)

In here we have:

  • In here we have the __match_args__ to allow positional pattern matching on the dynamic attributes supported by __getattr__.
  • Support for positional component access.

And we can safely execute this:

>>> v = Vector(range(5))
>>> v
Vector([0.0, 1.0, 2.0, 3.0, 4.0])
>>> v.x
0.0
>>> v.x
10 # We can set an x attribute directly and still have the same vector?
>>> v
Vector([0.0, 1.0, 2.0, 3.0, 4.0])

What we are still missing is the logic for setting attributes in our Vector class, given that we have implemented the __getattr__ it makes sense to implement the setter __setattr__, especially  since the behaviour we have now is inconsistent. If you noticed we  set v.x to 10 earlier but the vector instance hasn't changed.

Python only calls the  __getattr__ method as a fallback, when the object does not have the named attribute. However, after we assign v.x = 10, the v object now has an x attribute, so __getattr__ will no longer be called to retrieve v.x, i.e. the interpreter will  return the value 10 that is bound to v.x and the current __getattr__ implementation of pays no attention to instance attributes other than self._components.

Let's add the __setattr__ implementation:

class Vector:
    # many lines omitted
    # ...

    def __setattr__(self, name, value):
        cls = type(self)
        if len(name) == 1:
            if name in cls.__match_args__:
                error = "readonly attribute {attr_name!r}"
            elif name.islower():
                error = "can't set attributes 'a' to 'z' in {cls_name!r}"
            else:
                error = ""
            if error:
                msg = error.format(cls_name=cls.__name__, attr_name=name)
                raise AttributeError(msg)
        super().__setattr__(name, value)

We added some special handling for single-character attribute names 'a' to 'z', i.e. we don't allow setting them,  if the name is in our __match_args__ we throw an error and in the default case we just call the super method.

Here's another quote of the original author on the __getattr__ and __setattr__ methods:

very often when you implement __getattr__, you need to code __setattr__ as well, to avoid inconsistent behavior in your objects.

Here's the Vector class so far:

from array import array
import reprlib
import math


class Vector:
    typecode = "d"
    __match_args__ = ("x", "y", "z", "t")

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find("[") : -1]
        return f"Vector({components})"

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __eq__(self, other):
        return tuple(self) == tuple(other)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
            if 0 <= pos < len(self._components):
                return self._components[pos]
            msg = f"{cls.__name__!r} object has no attribute {name!r}"
            raise AttributeError(msg)

    def __setattr__(self, name, value):
        cls = type(self)
        if len(name) == 1:
            if name in cls.__match_args__:
                error = "readonly attribute {attr_name!r}"
            elif name.islower():
                error = "can't set attributes 'a' to 'z' in {cls_name!r}"
            else:
                error = ""
            if error:
                msg = error.format(cls_name=cls.__name__, attr_name=name)
                raise AttributeError(msg)
        super().__setattr__(name, value)

How to Achieve faster == and Support Hashing?

We have discussed hashing before here and its benefits especially that most of the time we will be making use of hashable instances.

In our Vector class we will implement the __hash__ method by computing the accumulated xor of the self._components array.

To do this, we will make use of the functools and operator packages, to use the reduce function and operator.xor operator.

class Vector:
    # many lines omitted
    # ...

    def __hash__(self):
        hashes = (hash(x) for x in self._components)
        return functools.reduce(operator.xor, hashes, 0)

And while we are on the subject of comparison, we could also alter the implementation of the __eq__ method, as it could be quite expensive to compare component by component in the case of large dimension vectors(for e.g. a vector with 999999999 component).

This will be a very small optimization but cheaper in terms of processing and
memory.

class Vector:
    # many lines omitted
    # ...

    def __eq__(self, other):
        if len(self) != len(other):
            return False
        for a, b in zip(self, other):
            if a != b:
                return False
        return True

The zip produces a generator of tuples made from the items in each iterable argument so we could write this: a != b.

Here's the full Vector implementation so far:

from array import array
import reprlib
import math


class Vector:
    typecode = "d"
    __match_args__ = ("x", "y", "z", "t")

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find("[") : -1]
        return f"Vector({components})"

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
            if 0 <= pos < len(self._components):
                return self._components[pos]
            msg = f"{cls.__name__!r} object has no attribute {name!r}"
            raise AttributeError(msg)

    def __setattr__(self, name, value):
        cls = type(self)
        if len(name) == 1:
            if name in cls.__match_args__:
                error = "readonly attribute {attr_name!r}"
            elif name.islower():
                error = "can't set attributes 'a' to 'z' in {cls_name!r}"
            else:
                error = ""
            if error:
                msg = error.format(cls_name=cls.__name__, attr_name=name)
                raise AttributeError(msg)
        super().__setattr__(name, value)

    def __eq__(self, other):
        if len(self) != len(other):
            return False
        for a, b in zip(self, other):
            if a != b:
                return False
        return True

    def __hash__(self):
        hashes = (hash(x) for x in self._components)
        return functools.reduce(operator.xor, hashes, 0)

How to Format our output?

To support formatting with f'strings or the format function we need to implement the __format__ magic method.

We already went over the __format__ magic method in the previous article and you can find more detailed explanation about it here.

In our case, formatting a vector instance includes some research about math which I will happily leave the link for in here.✌️

Here's the __format__ method implementation with the methods needed for it:

class Vector:
    # many lines omitted
    # ...

    def angle(self, n):
        r = math.hypot(*self[n:])
        a = math.atan2(r, self[n - 1])
        if (n == len(self) - 1) and (self[-1] < 0):
            return math.pi * 2 - a
        else:
            return a

    def angles(self):
        return (self.angle(n) for n in range(1, len(self)))

    def __format__(self, fmt_spec=""):
        if fmt_spec.endswith("h"):  # hyperspherical coordinates
            fmt_spec = fmt_spec[:-1]
            coords = itertools.chain([abs(self)], self.angles())
            outer_fmt = "<{}>"
        else:
            coords = self
            outer_fmt = "({})"

    	components = (format(c, fmt_spec) for c in coords)
    	return outer_fmt.format(", ".join(components))

Final Vector class Implementation

from array import array
import reprlib
import math


class Vector:
    typecode = "d"
    __match_args__ = ("x", "y", "z", "t")

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = reprlib.repr(self._components)
        components = components[components.find("[") : -1]
        return f"Vector({components})"

    def __str__(self):
        return str(tuple(self))

    def __bytes__(self):
        return bytes([ord(self.typecode)]) + bytes(self._components)

    def __abs__(self):
        return math.hypot(*self)

    def __bool__(self):
        return bool(abs(self))

    @classmethod
    def frombytes(cls, octets):
        typecode = chr(octets[0])
        memv = memoryview(octets[1:]).cast(typecode)
        return cls(memv)

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

    def __getattr__(self, name):
        cls = type(self)
        try:
            pos = cls.__match_args__.index(name)
        except ValueError:
            pos = -1
            if 0 <= pos < len(self._components):
                return self._components[pos]
            msg = f"{cls.__name__!r} object has no attribute {name!r}"
            raise AttributeError(msg)

    def __setattr__(self, name, value):
        cls = type(self)
        if len(name) == 1:
            if name in cls.__match_args__:
                error = "readonly attribute {attr_name!r}"
            elif name.islower():
                error = "can't set attributes 'a' to 'z' in {cls_name!r}"
            else:
                error = ""
            if error:
                msg = error.format(cls_name=cls.__name__, attr_name=name)
                raise AttributeError(msg)
        super().__setattr__(name, value)

    def __eq__(self, other):
        if len(self) != len(other):
            return False
        for a, b in zip(self, other):
            if a != b:
                return False
        return True

    def __hash__(self):
        hashes = (hash(x) for x in self._components)
        return functools.reduce(operator.xor, hashes, 0)

    def angle(self, n):
        r = math.hypot(*self[n:])
        a = math.atan2(r, self[n - 1])
        if (n == len(self) - 1) and (self[-1] < 0):
            return math.pi * 2 - a
        else:
            return a

    def angles(self):
        return (self.angle(n) for n in range(1, len(self)))

    def __format__(self, fmt_spec=""):
        if fmt_spec.endswith("h"):  # hyperspherical coordinates
            fmt_spec = fmt_spec[:-1]
            coords = itertools.chain([abs(self)], self.angles())
            outer_fmt = "<{}>"
        else:
            coords = self
            outer_fmt = "({})"

    	components = (format(c, fmt_spec) for c in coords)
    	return outer_fmt.format(", ".join(components))

Conclusion

As you can see, Python makes really good use of protocols and it gives us so much freedom to incorporate specific behaviors without having to deal with inheritance. And this isn't the only form available as there are other ways to achieve your goal before having to resort to inheritance.

Further Reading