Class Warfare in Python 2

In Python 2, there are two distinct types of classes, the so-called "old" (or "classic") and (increasingly inaccurately named) "new" style classes. This post will discuss the differences between the two as visible to the Python programmer, and explore a little bit of the implementation decisions behind those differences.

Contents

New Style Classes

New style classes were introduced in Python 2.2 back in December of 2001. [1] New style classes are most familiar to modern Python programmers and are the only kind of classes available in Python 3.

Let's look at some of the observable properties of new style classes.

>>> class NewStyle(object):
...    pass

In Python 2, we create new style classes by (explicitly or implicitly) inheriting from object [2]. This means that the class NewStyle is an instance of type, that is, type is the metaclass (alternately, we can say that any instance of type is a new style class):

>>> NewStyle
<class 'NewStyle'>
>>> isinstance(NewStyle, type)
True
>>> type(NewStyle)
<type 'type'>

Instances of this class are, well, instances of this class, and their type is this class:

>>> new_style_instance = NewStyle()
>>> type(new_style_instance) is NewStyle
True
>>> isinstance(new_style_instance, NewStyle)
True

Unsurprisingly, instances of NewStyle classes are also instances of object; it's right there in the class definition and the class __bases__ and the method resolution order (MRO):

>>> isinstance(new_style_instance, object)
True
>>> NewStyle.__bases__
(<type 'object'>,)
>>> NewStyle.mro()
[<class 'NewStyle'>, <type 'object'>]

The default repr of a new style instance includes its type's name:

>>> print(repr(new_style_instance))
<NewStyle object at 0x...>

Old Style Classes

If that's a new style class, what's an old style class? Old style, or "classic" classes, were the only type of class available in Python 2.1 and earlier. For backwards compatibility when moving older code to Python 2.2 or later, they remained the default (we'll see shortly some of the practical behaviour differences between old and new style classes).

This means that we create an old style class by not specifying any ancestors. We also create old style classes when only specifying ancestors that themselves are old style classes.

>>> class OldStyle:
...    pass

Let's look at the same observable properties as we did with new style classes and compare them. We've already seen that we didn't specify object in the inheritance list. Is this class a type?

>>> OldStyle
<class __builtin__.OldStyle at 0x...>
>>> isinstance(OldStyle, type)
False
>>> type(OldStyle)
<type 'classobj'>

If you've been paying attention, you're not surprised to see that old style classes are not instances of type (after all, that's the definition we used for new style classes). They have a different repr, which is not too surprising, but they also have a different type.

>>> type(type(OldStyle))
<type 'type'>

Hmm, and the type of that different type (i.e., its metaclass) is type. That makes me wonder, do all old-style classes share a type? Let's continue exploring.

Instances of the old style class are of course still instances of the old style class, but their type is not that class, unlike with new style classes:

>>> old_style_instance = OldStyle()
>>> isinstance(old_style_instance, OldStyle)
True
>>> type(old_style_instance) is OldStyle
False
>>> type(old_style_instance)
<type 'instance'>

Instead, their type is something called instance. Now I'm wondering again, do all instances of old style classes share a type? If so, how does class-specific behaviour get implemented?

>>> class OldStyle2:
...    pass
>>> old_style2_instance = OldStyle2()
>>> type(old_style2_instance)
<type 'instance'>
>>> type(old_style2_instance) is type(old_style_instance)
True

They do! Curiouser and curiouser. How is type-specific behaviour implemented then? Let's keep going.

We saw that new style classes are instances of object, and it exists in their __bases__ and MRO. What about old classes?

>>> isinstance(old_style_instance, object)
True
>>> OldStyle.__bases__
()
>>> OldStyle.mro()
Traceback (most recent call last):
...
AttributeError: class OldStyle has no attribute 'mro'

Strangely, the old style instance is still an instance of object, even though it doesn't inherit from it, as seen by it not being in the __bases__ (and it doesn't even have the concept of an exposed MRO).

Finally, the repr of old style classes is also different:

>>> print(repr(old_style_instance))
<__builtin__.OldStyle instance at 0x...>

Practically Speaking

OK, that's enough time spent down in the weeds dealing with subtle differences in inheritance bases and reprs. Before we talk about implementation, are there any practical differences a Python programmer should be concerned about, or at least know about?

Yes. I'll hit some of the highlights here, there are probably others. (TL;DR: Always write new-style classes.)

Operators

For new style classes, all the Python special "dunder" methods (like __repr__ and the arithmetic and comparison operators such as __add__ and __eq__) must be defined on the class itself. If we want to change the repr, we can't do it in the instance, we have to do so on the class:

>>> new_style_instance.__repr__ = lambda: "Hi, I'm a new instance"
>>> print(repr(new_style_instance))
<NewStyle object at 0x...>
>>> NewStyle.__repr__ = lambda self: "Hi, I'm the new class"
>>> print(repr(new_style_instance))
Hi, I'm the new class

(We monkey-patched a class after its definition here, which does work thanks to some magic well discuss soon, but that's not usually the way its done.)

In contrast, old style classes allowed dunders to be assigned to an instance, even overriding one set on the class, in exactly the same way that regular methods work on both new and old style classes.

>>> OldStyle.__repr__ = lambda self: "Hi, I'm the old class"
>>> print(repr(old_style_instance))
Hi, I'm the old class
>>> old_style_instance.__repr__ = lambda: "Hi, I'm the old instance"
>>> print(repr(old_style_instance))
Hi, I'm the old instance

Old Style Instances are Bigger

On a 64-bit platform, old style instances are one pointer size larger:

>>> import sys
>>> sys.getsizeof(new_style_instance)
64
>>> sys.getsizeof(old_style_instance)
72

Old Style Classes Ignore `slots`

The slots declaration can be used not only to optimize memory usage for small frequently used objects but offer tighter control on what attributes an instance can have, which may be useful in some scenarios. Unfortunately, old style classes accept but ignore this declaration.

>>> class NewWithSlots(object):
...      __slots__ = ('attr',)
>>> new_with_slots = NewWithSlots()
>>> sys.getsizeof(new_with_slots)
56
>>> new_with_slots.not_a_slot = 42
Traceback (most recent call last):
...
AttributeError: 'NewWithSlots' object has no attribute 'not_a_slot'
>>> new_with_slots.attr = 42
>>> sys.getsizeof(new_with_slots)
56

>>> class OldWithSlots:
...      __slots__ = ('attr',)
>>> old_with_slots = OldWithSlots()
>>> sys.getsizeof(old_with_slots)
72
>>> old_with_slots.not_a_slot = 42
>>> sys.getsizeof(old_with_slots)
72

Old Style Classes Ignore `getattribute`

The getattribute dunder method allows an instance to intercept all attribute access. It's what makes it possible to implement transparently persistent objects in pure-Python. Indeed, it's the same basic feature that allowed for descriptors to be implemented in Python 2.2. Old style classes only support __getattr__, which is only used when the attribute cannot be found.

>>> class NewWithCustomAttrs(object):
...   def __getattribute__(self, name):
...       print("Looking up %s" % name)
...       if name == 'computed': return 'computed'
...       return object.__getattribute__(self, name)
...   def __getattr__(self, name):
...       print("Returning default for %s" % name)
...       return 1
>>> custom_new = NewWithCustomAttrs()
>>> custom_new.foo
Looking up foo
Returning default for foo
1
>>> custom_new.foo = 42
>>> custom_new.foo
Looking up foo
42
>>> custom_new.computed = 42
>>> custom_new.computed
Looking up computed
'computed'

>>> class OldWithCustomAttrs:
...   def __getattribute__(self, name):
...       print("Looking up %s" % name)
...       if name == 'computed': return 'computed'
...       return object.__getattribute__(self, name)
...   def __getattr__(self, name):
...       print("Returning default for %s" % name)
...       return 1
>>> custom_old = OldWithCustomAttrs()
>>> custom_old.foo
Returning default for foo
1
>>> custom_old.foo = 42
>>> custom_old.foo
42
>>> custom_old.computed = 42
>>> custom_old.computed
42

Contrary to some information that can be found on the Internet, old style classes can use most descriptors, including @property.

Old Style Classes Have a Simplistic MRO

When multiple inheritance is involved, the MRO of an old style class can be quite surprising. I'll let Guido provide the explanation.

Old Style Classes are Slow

This is especially true on PyPy, but even on CPython 2.7 an old style class is three times slower at basic arithmetic.

>>> import timeit
>>> class NewAdd(object):
...    def __add__(self, other):
...        return 1
>>> timeit.new_add = NewAdd() # Cheating, sneak this into the globals
>>> timeit.timeit('new_add + 1')
0.2...
>>> class OldAdd:
...    def __add__(self, other):
...        return 1
>>> timeit.old_add = OldAdd()
>>> timeit.timeit('old_add + 1')
0.6...

Implementing Old Style Classes

The shift to new style classes represents something of a shift away from Python's earlier free-wheeling days when it could almost be thought of as a prototype-based language into something more traditionally class based. Yet Python 2.2 up through Python 2.7 still support the old way of doing things. It would be silly to imagine there are two completely separate object and type hierarchy implementations inside the CPython interpreter, along with two separate ways of doing arithmetic and getting reprs, etc [3], yet the old style classes continue to work as always. How is this done?

We've had some tantalizing clues to help point us in the right direction.

Regarding instances of old style classes, even separate classes are the same type. The same is not true for new style classes.
Old style instances are still instances of object.
Regarding old style classes themselves, they are not instances of type, rather of classobj...which itself is an instance of type.
Regarding perfomance, the change to method lookup "provides significant scope for speed optimisations within the interpreter".
If you followed the link to see what was new in Python 2.2, you'd see that one of the other major features introduced by PEP 253 was the ability to subclass built-in types like list and override their operators. In Python 2.1, just as with old style classes today, all classes implemented in Python shared the same type, but classes implemented in C (like list) could be distinct types; importantly, types were not classes. Here's Python 2.1:
```
>>> class OldStyle:
...     pass
>>> class OldStyle2:
...     pass
>>> type(OldStyle())
<type 'instance'>
>>> type(OldStyle2())
<type 'instance'>
>>> type([])
<type 'list'>
>>> class Foo(list):
...     pass
Traceback (most recent call last):
...
TypeError: base is not a class object
```

To further understand how CPython implements both old and new-style classes, you need to understand just a bit about how it represents types internally and how it invokes those special dunder methods.

CPython Types

In CPython, all types are defined by a C structure called PyTypeObject. It has a series of members (confusingly also called slots) that are C function pointers that implement the behaviour defined by the type. For example, tp_repr is the slot for the __repr__ function, and tp_getattro is the slot for the __getattribute__ function. When the interpreter needs to get the repr of an object, or get an attribute from it, it always goes through those function pointers. An operation like repr(obj) becomes the C equivalent of type(obj).__repr__(obj).

When we create a new style class, the standard type metaclass makes a C function pointer out of each special dunder method defined in the class's __dict__ and uses that to fill in the slots in the C structure. Special methods that aren't present are inherited. Similarly, when we later set such an attribute, the __setattr__ function of the type metaclass checks to see if it needs to update the C slots.

The C structure that defines the object type uses PyObject_GenericGetAttr for its tp_getattro (__getattribute__) slot. That function implements the "standard" way of accessing attributes of objects. It's the same thing we showed above in old_getattribute when we called object.__getattribute__(self, name). The object type also leaves most other slots (like for __add__) empty, meaning that subclasses must define them in the class __dict__ (or assign to them) for them to be found.

Importantly, though, other types implemented in C don't have to do any of that. They could set their tp_getattro and tp_repr slots, along with any of the other slots, to do whatever they want.

You might now understand how PEP 253, which gave us new style classes, also gave us the ability to subclass C types.

instance types

That's exactly what <type 'instance'> does. That type is defined by PyInstance_Type. The metaclass classobj (recall that's the type of old style classes) sets its tp_call slot (aka __call__) to PyInstance_New, a function that returns a new object of <type 'instance'>, aka PyInstance_Type (explaining while all old instances have the same type). (Incidentally, the metaclass also returns instances of itself, not new sub-types as the type metaclass does, which explains why they all have the same type.)

PyInstance_Type, in turn, overrides almost all of the possible slot definitions to implement the old behaviour of first looking in the instance's __dict__ and only if it's not present then looking up the class hierarchy or invoking __getattr__. For example, tp_repr is instance_repr, which uses instance_getattr to dynamically find an appropriate __repr__ function from the instance or class. This, by the way, is the source of the performance difference between old and new style classes.

Implementing Old Style Classes in Python

Can we re-implement old style classes in pure-Python (for example, on Python 3)? I haven't tried, but we can probably get pretty close. It would be similar to implementing transparent object proxies in pure-Python, which is verbose and tedious.

Footnotes

[1]	Thus demonstrating the perils of naming something "new." Here we are 17 years later and still calling them that.

[2]	Or by explicitly specifying a `__metaclass__` of `type`. >>> class NewWithMeta: ... __metaclass__ = type >>> isinstance(NewWithMeta, type) True

[3]	Good programmers are lazy.