Implementing gevent locks

Previously we talked about gevent's hub and how greenlets switch in and out of it to implement IO. This time, let's talk about how we can use those same primitives, plus the properties that make greenlets greenlets, to implement locks. We'll start by walking through some basics about locks and move on to how locks are used in Python when working with threads. All of that will be background for finally covering how gevent implements greenlet-based locks.

Contents

Padlock, deadbolt, combination, or cylinder?

A lock is a synchronization primitive, often used to implement mutual exclusion among multiple threads of control. When one thread has acquired the lock, no other thread may acquire it until the first thread releases it. Threads that want to use data shared with other threads first acquire the lock, then read or modify the data (and are guaranteed that no other thread is doing the same), and finally release the lock. In this way critical data are kept in a consistent state.

If the threads are native operating system threads that can run simultaneously, a lock has to be implemented at a low level. A simple type of native lock is a spin lock that uses CPU atomic compare-and-swap instructions to keep checking over and over if some other thread has acquired the lock. Busily burning CPU cycles waiting for something to change is not an efficient use of CPU resources [1] , and doesn't let other threads that aren't interested in the lock make much progress, so most threading systems provide lock types that cooperate with the operating system and its scheduler to make the process smoother by relinquishing the CPU and only gaining it again when the lock is released [2]. On most Unix-like systems that's pthread_mutex_t(3P); similar types exist on Windows.

Python locks

There's a company that makes a physical Python lock. Who knew?

Python's threading module creates native operating system threads, so it needs to also be able to create native operating system locks. It does this with the low-level function _thread.allocate_lock() [3], also known as threading.Lock. This function, and the objects it returns, are implemented in C. This has the important consequence that, from the Python perspective, acquiring and releasing a lock are truly atomic and can't be interrupted, for example with a signal, or KeyboardInterrupt, or a system trace function [4].

Built on top of these low-level locks are some other synchronization tools:

reentrant locks: A lock that can be acquired multiple times by the same thread, useful for recursion.
semaphores: A type of lock that has a counter: as long as the counter is positive the lock can be acquired by any thread, which decrements the counter, and releasing it increments the counter. This is used for limited resources. Note that a semaphore with a counter of 1 is functionally the same as a lock.
condition variables: Condition variables allow different threads to be notified when another thread releases the lock.

All of these are implemented in Python code and only depend on threading.Lock [5]. That's a convenient fact because it means that if gevent is asked to monkey-patch, it can make the standard threading lock APIs work just by patching one function.

Lock requirements

With that background, we can write out some of the key requirements of locks:

They support mutual exclusion of threads of control. In gevent's case, this means greenlets.
They operate efficiently with the operating system and scheduler. For gevent, this means that if acquiring a lock isn't possible (it's already acquired by another greenlet), the requesting greenlet should yield to the hub so other greenlets can run. It also means that releasing a lock should efficiently "wake up" a greenlet that was waiting to acquire it.
They are efficient from a resource usage perspective.
Lock methods (acquire/release) should be atomic from a signal and tracing perspective.

gevent locks

This is just a lock on a green background because greenlets.

gevent does things a bit differently than Python. Whereas it begins with a lock and builds a semaphore on top of it, gevent begins with a semaphore and builds a lock and reentrant lock on top of it. The lock class is a trivial subclass of a semaphore that just changes some error handling for compatibility with Python's threading lock and uses a counter of one. All the interesting work is in the semaphore, so lets take a deep dive into how the semaphore meets those key requirements. (The code below is not the real implementation, but it should sketch out the ideas.)

First, resource usage and atomic tracing. Those are easy [6]: just compile the semaphore into a C extension using Cython. Just like that we have the necessary guarantees.

Mutual exclusion of greenlets is also easy: by definition, only one greenlet in a native thread is running at any one time, so as long as the semaphore itself doesn't switch greenlets, no other greenlet can come along and interrupt its job. That means we can start out like a spin lock and just check to see if some other greenlet has acquired the semaphore. If not, congratulations, you just acquired the semaphore.

def acquire(self):
    if self.counter > 0:
        # Yay, we got it!
        self.counter -= 1
        return True

Spinning, take 1

Exercise for the reader: why would an actual spin lock not work here?

Why would this code fail?

def acquire(self):
    while self.counter <= 0:
         pass
    return True

If you only want to try to acquire the lock, and some other greenlet already has it, well it isn't congratulations, but at least you can have your answer very quickly.

def acquire(self, blocking=True):
   if self.counter > 0:
       # Yay, we got it!
       self.counter -= 1
       return True
   if not blocking:
       # Boo, someone else has it.
       return False

Cooperating efficiently with other greenlets (letting them run while we wait to acquire the semaphore, and waking up when they release the semaphore) is only a little bit more complicated. It works just like the IO we previously described, by yielding to the hub and arranging for it to wake us up in a timely fashion.

Here's the acquire side, where we yield to the hub and arrange to wake up later:

def acquire(self, blocking=True):
   if self.counter > 0:
       # Yay, we got it!
       self.counter -= 1
       return True
   if not blocking:
       # Boo, someone else has it.
       return False

   # At this point we know we want to block, and we know
   # some other greenlet is holding the lock.

   # Arrange for the releasing greenlet to know that we
   # are waiting and it should switch back to us.
   switch_back_to_this_greenlet = getcurrent().switch
   self._waiting_greenlets.append(switch_back_to_this_greenlet)

   # Go back to the hub, let it do IO or switch into other greenlets
   get_hub().switch() # This greenlet stops running here.

   # OK, now we're back in this greenlet. The only
   # way for that to happen was for `switch_back_to_this_greenlet`
   # to be called. And the only way for *that* to happen was
   # for someone else to have released the lock. Thus,
   # we can acquire the lock.
   assert self.counter > 0
   self.counter -= 1
   return True

The release side is where some other greenlet schedules the waiting greenlets to wakeup:

def release(self):
    # This is safe because we know no other greenlet is running
    self.counter += 1
    # If some other greenlet wants to acquire us, schedule them
    # to wake up. This greenlet continues running.
    if self._waiting_greenlets:
        def wake_up_waiters():
            while self.counter and self._waiting_greenlets:
                # We guarantee to only wake up a waiting greenlet
                # if it can actually acquire the lock.
                self._waiting_greenlets().pop(0)() # Run that other greenlet
        get_hub().loop.run_callback(wake_up_waiters)

The call to run_callback asks the hub to execute some code at some point in the (near) future. In this case, we're asking the hub to run the code that in turn wakes up each greenlet that was waiting, so long as they can acquire the lock.

Wrapping it up

There you have it. Locks in gevent are handled pretty much like IO.

This implementation has a few consequences worth noting:

waiting greenlets are awakened in a first-come first-serve fashion
acquiring a lock that's uncontended is cheap
releasing a lock is also cheap and doesn't unschedule the current greenlet

Spinning, take 2

Exercise for the reader: If the above spin loop failed, why would this one work? What's different? Why don't we use this implementation?

def acquire(self):
    while self.counter <= 0:
         gevent.sleep()
    return True

Footnotes

[1]	For locks that will be held only for brief periods of time, spin locks actually may be an efficient use of CPU resources.

[2]	That should sound familiar. That's the very thing that select(2), poll(2), and the like, upon which gevent is built, were created to do for IO.

[3]	In Python 2, this is simply `thread.allocate_lock`.

[4]	For an interesting discussion on what this means, see Nathaniel J. Smith's article Control-C handling in Python and Trio.

[5]	OK, that's not strictly true. Starting in Python 3.3, `RLock` was optionally implemented in C. And `Condition` defaults to using `RLock`, but `Semaphore` uses it with a plain `Lock`.

[6]	On CPython; on PyPy things are a bit more complicated.