The Artima Developer Community
Sponsored Link

The Explorer
Things to Know About Python Super [1 of 3]
by Michele Simionato
August 12, 2008
Summary
super is perhaps the trickiest Python construct: this series aims to unveil its secrets

Advertisement

Foreword

I begun programming with Python in 2002, just after the release of Python 2.2. That release was a major overhaul of the language: new-style classes were introduced, the way inheritance worked changed and the builtin super was introduced. Therefore, you may correctly say that I have worked with super right from the beginning; still, I never liked it and over the years I have discovered more and more of its dark corners.

In 2004 I decided to write a comprehensive paper documenting super pitfalls and traps, with the goal of publishing it on the Python web site, just as I had published my essay on multiple inheritance and the Method Resolution Order. With time the paper grew longer and longer but I never had the feeling that I had covered everything I needed to say: moreover I have a full time job, so I never had the time to fully revise the paper as a whole. As a consequence, four years have passed and the paper is still in draft status. This is a pity, since it documents issues that people encounter and that regularly come out on the Python newsgroups and forums.

Keeping the draft sitting on my hard disk is doing a disservice to the community. Still, I lack to time to finish it properly. To come out from the impasse, I decided to split the long paper in a series of short blog posts, which I do have the time to review properly. Moreover people are free to post comments and corrections in case I am making mistakes (speaking about super this is always possible). Once I finish the series, I may integrate the corrections, put it together again and possibly publish it as whole on the Python website. In other words, in order to finish the task, I am trying the strategies of divide et conquer and release early, release often. We will see how it goes.

Introduction

super is a Python built-in, first introduced in Python 2.2 and slightly improved and fixed in later versions, which is often misunderstood by the average Python programmer. One of the reasons for that is the poor documentation of super: at the time of this writing (August 2008) the documentation is incomplete and in some parts misleading and even wrong. For instance, the standard documentation (even for the new 2.6 version http://docs.python.org/dev/library/functions.html#super) still says:

super(type[, object-or-type])
  Return the superclass of type. If the second argument is omitted the
  super object returned is unbound. If the second argument is an object,
  isinstance(obj, type) must be true. If the second argument is a type,
  issubclass(type2, type) must be true. super() only works for new-style
  classes.

[UPDATE: the final version of Python 2.6 has a better documentation for super, as a direct consequence of this post ;)]. The first sentence is just plain wrong: super does not return the superclass. There is no such a thing as the superclass in a Multiple Inheritance (MI) world. Also, the sentence about unbound is misleading, since it may easily lead the programmer to think about bound and unbound methods, whereas it has nothing to do with that concept. IMNSHO super is one of the most tricky and surprising Python constructs, and we absolutely need a document to shed light on its secrets. The present paper is a first step in this direction: it aims to tell you the truth about super. At least the amount of truth I have discovered with my experimentations, which is certainly not the whole truth ;)

A fair warning is in order here: this document is aimed at expert Pythonistas. It assumes you are familiar with new style classes and the Method Resolution Order (MRO); moreover a good understanding of descriptors would be extremely useful. Some parts also require good familiarity with metaclasses. All in all, this paper is not for the faint of heart ;)

There is no superclass in a MI world

Readers familiar will single inheritance languages, such as Java or Smalltalk, will have a clear concept of superclass in mind. This concept, however, has no useful meaning in Python or in other multiple inheritance languages. I became convinced of this fact after a discussion with Bjorn Pettersen and Alex Martelli on comp.lang.python in May 2003 (at that time I was mistakenly thinking that one could define a superclass concept in Python). Consider this example from that discussion:

       +-----+
       |  T  |
       |a = 0|
       +-----+
     /         \
    /           \
+-------+    +-------+
|   A   |    |   B   |
|       |    | a = 2 |
+-------+    +-------+
    \           /
     \         /
       +-----+
       |  C  |
       +-----+
          :
          :    instantiation
          c
>>> class T(object):
...     a = 0
>>> class A(T):
...     pass
>>> class B(T):
...     a = 2
>>> class C(A,B):
...     pass
>>> c = C()

What is the superclass of C? There are two direct superclasses (i.e. bases) of C: A and B. A comes before B, so one would naturally think that the superclass of C is A. However, A inherits its attribute a from T with value a=0: if super(C,c) was returning the superclass of C, then super(C,c).a would return 0. This is NOT what happens. Instead, super(C,c).a walks trought the method resolution order of the class of c (i.e. C) and retrieves the attribute from the first class above C which defines it. In this example the MRO of C is [C, A, B, T, object], so B is the first class above C which defines a and super(C,c).a correctly returns the value 2, not 0:

>>> super(C,c).a
2

You may call A the superclass of C, but this is not a useful concept since the methods are resolved by looking at the classes in the MRO of C, and not by looking at the classes in the MRO of A (which in this case is [A,T, object] and does not contain B). The whole MRO is needed, not just the first superclass.

So, using the word superclass in the standard docs is misleading and should be avoided altogether.

Bound and unbound (super) methods

Having established that super cannot return the mythical superclass, we may ask ourselves what the hell it is returning ;) The truth is that super returns proxy objects.

Informally speaking, a proxy is an object with the ability to dispatch to methods of other objects via delegation. Technically, super is a class overriding the __getattribute__ method. Instances of super are proxy objects providing access to the methods in the MRO. The dispatch is done in such a way that

super(cls, instance-or-subclass).method(*args, **kw)

corresponds more or less to

right-method-in-the-MRO-applied-to(instance-or-subclass, *args, **kw)

There is a caveat at this point: the second argument can be an instance of the first argument, or a subclass of it. In the first case we expect a bound method to be returned and in the second case and unbound method to be returned. This is true in recent versions of Python: for instance, in this example

>>> class B(object):
...     def __repr__(self):
...         return "<instance of %s>" % self.__class__.__name__
>>> class C(B):
...     pass
>>> class D(C):
...     pass
>>> d = D()

you get

>>> print super(C, d).__repr__
<bound method D.__repr__ of <instance of D>>

and

>>> print super(C, D).__repr__
<unbound method D.__repr__>

However, if you are still using Python 2.2 (there are unlucky people forced to use old versions) your should be aware that super had a bug and super(<class>, <subclass>).method returned a bound method, not an unbound one:

>> print super(C, D).__repr__ # in Python 2.2
<bound method D.__repr__ of <class '__main__.D'>>

That means that in Python 2.2 you get:

>> print super(C, D).__repr__() # in Python 2.2
<instance of type>

D, seen as an instance of the (meta)class type, is being passed as first argument to __repr__. This has been fixed in Python 2.3+, where you correctly get a TypeError:

>>> print super(C, D).__repr__() # the same as B.__repr__()
Traceback (most recent call last):
 ...
TypeError: unbound method __repr__() must be called with D instance as first
argument (got nothing instead)

The point is subtle, but usually one does not see problems since typically super is invoked on instances, not on subclasses, and in this case it works correctly in all Python versions:

>>> print super(C, d).__repr__()
<instance of D>

When I was using Python 2.2, due to the bug just discussed, and due to the super docstring

>>> print super.__doc__
super(type) -> unbound super object
super(type, obj) -> bound super object; requires isinstance(obj, type)
super(type, type2) -> bound super object; requires issubclass(type2, type)
Typical use to call a cooperative superclass method:
class C(B):
    def meth(self, arg):
        super(C, self).meth(arg)

I got the impression that in order to get unbound methods I needed to use the unbound super object. This is actually untrue. To understand how bound/unbound methods work we need to talk about descriptors.

super and descriptors

Descriptors (more properly I should speak of the descriptor protocol) were introduced in Python 2.2 by Guido van Rossum. Their primary motivation was technical, since they were needed to implement the new-style object system. Descriptors were also used to introduce new standard concepts in Python, such as classmethods, staticmethods and properties. Moreover, according to the traditional transparency policy of Python, descriptors were exposed to the application programmer, giving him/her the freedom to write custom descriptors. Any serious Python programmer should have a look at descriptors: luckily they are now very well documented (which was not the case when I first studied them :-/) thanks to the beautiful essay of Raimond Hettinger. You should read it before continuing this article, since it explains all the details. However, for the sake of our discussion of super, it is enough to say that a descriptor class is just a regular new-style class which implements a .__get__ method with signature __get__(self, obj, objtyp=None). A descriptor object is just an instance of a descriptor class.

Descriptor objects are intended to be used as attributes (hence their complete name attribute descriptors). Suppose that descr is a given descriptor object used as attribute of a given class C. Then the syntax C.descr is actually interpreted by Python as a call to descr.__get__(None, C), whereas the same syntax for an instance of C corresponds to a call to descr.__get__(c, type(c)).

Since the combination of descriptors and super is so tricky, the core developers got it wrong in different versions of Python. For instance, in Python 2.2 the only way to get the unboud method __repr__ is via the descriptor API:

>> super(C, d).__repr__.__get__(None, D) # Python 2.2
<unbound method D.__repr__>

You may check that it works correctly:

>> print _(d)
<instance of D>

In Python 2.3 one can get the unbond method by using the super(cls, subcls) syntax, but the syntax super(C, d).__repr__.__get__(None, D) also works; in Python 2.4+ instead the same syntax returns a bound method, not an unbound one:

>>> super(C, d).__repr__.__get__(None, D) # in Python 2.4+
<bound method D.__repr__ of <instance of D>>

The core developers changed the behavior again, making my life difficult while I was writing this paper :-/ I cannot trace the history of the bugs of super here, but if you are using an old version of Python and you find something weird with super, I advice you to have a look at the Python bug tracker before thinking you are doing something wrong. In this case, to be correct, the change is not in super, but in the descriptor implementation. In Python 2.2-2.3 you could get an unbound method from a bound one as follows:

>> d.__repr__.__get__(None, D) # in Python 2.2-2.3
<unbound method D.__repr__>

In Python 2.4 that does not work anymore:

>>> d.__repr__.__get__(None, D) # in Python 2.4+
<bound method D.__repr__ of <instance of D>>

Still, you can get the unbound method by passing for the underlying function first:

>>> d.__repr__.im_func.__get__(None, D) # in Python 2.4+
<unbound method D.__repr__>

Talk Back!

Have an opinion? Readers have already posted 2 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Michele Simionato adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Michele Simionato started his career as a Theoretical Physicist, working in Italy, France and the U.S. He turned to programming in 2003; since then he has been working professionally as a Python developer and now he lives in Milan, Italy. Michele is well known in the Python community for his posts in the newsgroup(s), his articles and his Open Source libraries and recipes. His interests include object oriented programming, functional programming, and in general programming metodologies that enable us to manage the complexity of modern software developement.

This weblog entry is Copyright © 2008 Michele Simionato. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use