By default, objects in Python can be weakly referenced. But if you use __slots__ in your class, or use Cython to compile it, that's no longer true without some extra effort. This lead to issue #1217 and issue #1211 in gevent's 1.3 release. Here, we'll discuss weak references, demonstrate the problem that __slots__ creates, and show the solution by implementing our own weak references.
Contents
Weak references in general
Most languages with garbage collection (automatic memory management) support the concept of a weak reference to a particular object. An ordinary reference to an object, such as storing it in a variable, is called a strong reference. As long as there is at least one strong reference to an object, the garbage collector will not free the object. Such objects are said to be alive.
In contrast, a weak reference to an object does not prevent the garbage collector from freeing the object. Weak references can be used when one piece of code wants to let some other code take ownership of an object and be responsible for its lifetime (maintaining a strong reference to it), but also still wants to be able to use that object as long as it hasn't been freed.
Weak references in Python
Python implements weak references with the weakref.ref class. This is a callable object whose constructor takes the object to monitor (the referent). As long as the referent is still alive, calling the weak reference will return it. After the referent is freed, calling the weak reference will return None; the weak reference is said to be cleared at that point.
>>> import weakref >>> subscriber = set() >>> subscriber_ref = weakref.ref(subscriber) >>> subscriber_ref <weakref at 0x10cde0050; to 'set' at 0x10cde3050> >>> subscriber_ref() is subscriber True >>> del subscriber >>> subscriber_ref <weakref at 0x10cde0050; dead> >>> subscriber_ref() None
Caution!
This example was written in CPython, which uses a reference counting garbage collection system. In this system, weak references are (usually!) cleared immediately after the last strong reference goes away. In other implementations like PyPy or Jython that use a different garbage collection strategy, the reference may not be cleared until some time later. Even in CPython, reference cycles may delay the clearing. Don't depend on references being cleared immediately!
Weak references can optionally call a function when they are cleared.
>>> def ref_cleared(ref): ... print("The reference", ref, "was cleared.") ... >>> subscriber = set() >>> subscriber_ref = weakref.ref(subscriber, ref_cleared) >>> subscriber_ref <weakref at 0x10cde01b0; to 'set' at 0x10cde3050> >>> del subscriber The reference <weakref at 0x10cde01b0; dead> was cleared.
We've been using the built-in set type to demonstrate weak references, but user-defined classes can also be weakly referenced.
>>> class MyClass(object): ... pass ... >>> my_object = MyClass() >>> my_object_ref = weakref.ref(my_object, ref_cleared) >>> my_object_ref <weakref at 0x10cde0158; to 'MyClass' at 0x10cddf610> >>> my_object_ref() is my_object True >>> del my_object The reference <weakref at 0x10cde0158; dead> was cleared.
Using __slots__ breaks weak references
Python allows (but does not require) a user-defined class to explicitly list the data members it will store by naming them in a special class field called __slots__. This is an optimization technique primarily used to save space when there will be many instances of a class and those instances will all have the same small number of attributes. A classic example is a coordinate point:
>>> class Point(object): ... __slots__ = ('x', 'y') ... def __init__(self, x, y): ... self.x = x ... self.y = y ... >>> point = Point(0, 0) >>> point.x 0 >>> point.y 0
When a class declares its slots, then those are the only fields that can be used. Storage space for those fields is allocated at a low, efficient level.
>>> point.z = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Point' object has no attribute 'z'
Mysteriously, we cannot create a weak reference to this object:
>>> weakref.ref(point) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot create weak reference to 'Point' object
Building our own weak references
To understand why Point cannot have a weak reference, let's build our own weak reference system. We will rely only on the __del__ special method, the object finalizer that's called by the garbage collector when it frees an object.
We'll begin with the implementation of the __del__ method. When an object is freed, it must clear all the weak references to it, and it must call their callbacks, if they have one. The easiest way to do this is to let the object have a list of all the weak references to it and simply iterate over them. To save space—most objects never have weak references—we use a class default of an empty tuple for that list
class WeakReferencable(object): weakreferences = () def __del__(self): for ref in self.weakreferences: ref.obj_id = None if ref.callback: ref.callback(ref)
Next is the weak reference implementation. It needs to be able to find the object when it is called, but it cannot actually have a strong reference to the object—that would introduce a cycle, and __del__ is not guaranteed to be called when there are cycles. Fortunately, in Python all objects have a unique id, and Python lets us iterate through all the objects in the system with gc.get_objects(), so we only need to store the referent's id and then search for it in the list of all objects. (Object ids can be reused once a particular object is freed, but __del__ takes care of making sure we don't mistake a new object for the old one in this case by setting the stored id to None.)
The only other thing this object needs to do is register itself in the list of references the referent maintains so that __del__ can do its work.
import gc class WeakReference(object): def __init__(self, obj, callback=None): self.obj_id = id(obj) self.callback = callback try: obj.weakreferences.append(self) except AttributeError: obj.weakreferences = [self] def __call__(self): if self.obj_id is None: # We have been cleared return None for o in gc.get_objects(): if id(o) == self.obj_id: return o
Given this, we have a weak reference that works pretty much like the one in the standard library.
>>> my_object = WeakReferencable() >>> my_object_ref = WeakReference(my_object, ref_cleared) >>> my_object_ref() is my_object True >>> del my_object The reference <__main__.WeakReference object at 0x11451a108> was cleared.
What happens if we try to create a reference to an object that's not WeakReferencable?
>>> WeakReference(object()) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'object' object has no attribute 'weakreferences'
There are two problems with that error message. First, it leaks implementation details by being an AttributeError. Second, an AttributeError is misleading. It's not really a problem with the instance, it's the fact that we used the wrong type—we need our special type that has the correct __del__ for this to work. A TypeError makes more sense here. Let's rewrite the __init__ method to handle this:
class WeakReference(object): def __init__(self, obj, callback=None): self.obj_id = id(obj) self.callback = callback try: obj.weakreferences.append(self) except AttributeError: try: obj.weakreferences = [self] except AttributeError: raise TypeError( "Cannot create weak references to '%s' object" % (type(obj).__name__))
That's better:
>>> WeakReference(object()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Cannot create weak references to 'object' object
Wait a minute, there's something very familiar about that error message. In fact, that's the same error message we got when we tried to use the standard library to create a weak reference to our Point class that used slots! Could there be an AttributeError hiding under the hood there too?
__weakref__: It really is that simple
Sure enough, there is. The __slots__ documentation in the deeply nested section 3.3.2.4.1 writes
Without a __weakref__ variable for each instance, classes defining __slots__ do not support weak references to its instances. If weak reference support is needed, then add '__weakref__' to the sequence of strings in the __slots__ declaration.
Since the point of __slots__ is to save space, and storing weak references takes up space, if we want instances of classes that use __slots__ to be allow weak references, we have to specifically opt-in. Makes sense. Lets try that.
>>> class Point(object): ... __slots__ = ('x', 'y', '__weakref__') ... def __init__(self, x, y): ... self.x = x ... self.y = y ... >>> point = Point(0, 0) >>> ref = weakref.ref(point)
This time, it worked. The implementation is very much like what we created above. We can even take a look at the __weakref__ attribute that Python filled in.
>>> ref <weakref at 0x1144f9ec8; to 'Point' at 0x1148a85f8> >>> point.__weakref__ <weakref at 0x1144f9ec8; to 'Point' at 0x1148a85f8> >>> point.__weakref__ is ref True
This is one of the cool things about Python. So much of the language is built on only a few key primitives, and much of the way the language works can be reasoned out starting from those primitives. (Expect posts on some more of those primitives, like data descriptors, later.)
There are of course some key differences between our custom implementation and the one in the standard library. For starters, the standard library version doesn't need to use __del__—all the logic we put there is part of the garbage collector itself. It also doesn't need to use gc.get_objects() to find the referent, it can play some tricks in the garbage collector implementation. And instead of using a Python list to hold all the weak references to the object, there is a more complicated, lower-level mechanism used (a C doubly-linked list).
Cython
How does Cython fit in to all of this? Cython is used to compile Python-like code into C code to create extension types, like the built-in types. Extension types operate very much like a user-defined class with __slots__:
The set of attributes is fixed at compile time; you can’t add attributes to an extension type instance at run time simply by assigning to them
cdef class Point: cdef public int x, y
Extension types cannot be weakly referenced by default. The solution is the same as for __slots__: add __weakref__ (Cython processes this into the code necessary for extension types to support weak references.)
cdef class Point: cdef public int x, y cdef object __weakref__
Existing classes
When adding __slots__ to an existing class, or compiling an existing class with Cython, it is important to remember to add __weakref__, or at least think carefully about whether the class should allow weak references. By default, it won't, which will be a change in behaviour.