Transaction IDs in ZODB and RelStorage

Earlier, we introduced ZODB as a transactional object database. This post takes a look into an implementation detail: transaction IDs. What are they, what are they used for, and why does any of this matter?

Along the way, we'll also look at another type of identifier used in ZODB, object IDs.

Contents

Rather than try to talk about transactions in general, I'll use ZODB-colored lenses and only talk about transactions and transaction IDs as they relate to ZODB and how ZODB uses them.

Implementing ZODB: Motivation for Transaction IDs

To understand how ZODB uses transaction IDs, let's work through an example of a basic use of ZODB, pausing along the way to consider how ZODB is actually implementing things.

We know that ZODB stores persistent objects; we can ask ZODB to fetch a persistent object for us (perhaps much later, or perhaps from a different machine) and it will come back to us in the state we left it. The easiest way to do this is by using normal Python mechanisms to build and walk through an object graph, starting from the ZODB root object.

>>> from ZODB.DB import DB
>>> from ZODB.DemoStorage import DemoStorage
>>> from persistent import Persistent
>>> import transaction
>>> db = DB(DemoStorage())
>>> transaction.begin()
<transaction._transaction.Transaction object at 0x110604c80>
>>> conn = db.open()
>>> conn.root()['obj'] = Persistent()
>>> transaction.commit()
None

We can use a different connection (simulating a different machine, say) and walk through the object graph, starting from the root, to find our object.

>>> transaction.begin()
<transaction._transaction.Transaction object at 0x110a3ff00>
>>> conn2 = db.open()
>>> conn2.root()['obj']
<persistent.Persistent object at 0x110a973d0 oid 0x3e2ee68c1ab9f8d3 in <Connection at 110a3fa00>>

Object IDs

Everything starts from the root. But where does the root object come from? Let's look at the root() method of the Connection:

def root(self):
    """Return the database root object."""
    return RootConvenience(self.get(z64))

That points us to the get() method:

def get(self, oid):
    """Return the persistent object with oid 'oid'."""
    # Simplified,  details removed
    obj = self._cache.get(oid, None)
    if obj is not None:
        return obj
    pickle_data, _ = self._storage.load(oid)
    obj = self._reader.getGhost(pickle_data)
    self._cache.new_ghost(oid, obj)
    return obj

There's a lot going on in these two little snippets, so lets unpack them, starting with get(). What can we deduce from this?

A Connection caches objects, and when objects can't be found in the cache, it asks the "storage" object to load it. (More on the storage in a minute.)
Persistent objects are identified by "OIDs" (short for "object identifier", or just "object ID"). Just as any Python object in memory has a unique identifier—returned by id(obj) and commonly seen in repr output as a hexadecimal number, as in <persistent.Persistent object at 0x110a973d0...>—, any persistent object stored in the ZODB has a unique identifier, its OID. An important difference is that id(obj) is only meaningful within the scope of a single process, but the ZODB OID is usable from any process connected to the same ZODB.
You can think of the OID as (part of) the "address" of a particular persistent object in the same way that id(obj) returns the memory address of an object in CPython. The OID is used to find or store objects in the cache, and also to load the stored data for the object.
The root object has a special, pre-defined OID known as "z64". That's our starting point for traversing the object graph.

You can't see this just from the code, but ZODB defines OIDs to be 8 byte, or 64-bit, quantities. The special value z64 is just 8 zero bytes in a row. ZODB has some utilities to convert between the 8 byte form or a 64-bit number.

>>> from ZODB.Connection import z64
>>> z64
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> conn.root()._p_oid
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> from ZODB.utils import p64, u64
>>> u64(z64)
0
>>> p64(0)
b'\x00\x00\x00\x00\x00\x00\x00\x00'

Each persistent object knows its own OID and keeps track of it in a special attribute known as _p_oid.

>>> conn.get(z64)._p_oid
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> conn.root()['obj']._p_oid
b'>.\xe6\x8c\x1a\xb9\xf8\xd3'
>>> u64(conn.root()['obj']._p_oid)
4480772168698427603

Storage Details and Referencing Objects

The get() method of a ZODB Connection returns persistent objects, but it uses the lower level "storage" interface to get the actual data stored for the object. Each ZODB Connection has a storage object, and all the Connections of a particular ZODB DB have storage objects that are reading and writing to the same shared data (e.g., a single file for FileStorage, or a single SQL database for RelStorage.) Among other things, the storage is responsible for holding the data stored for the persistent objects. This data, or state, is kept as a byte-string produced by pickle. We can use pickletools to peek at this.

>>> pickletools.dis(conn._storage.load(z64)[0])
    0: \x80 PROTO      3
    2: c    GLOBAL     'persistent.mapping PersistentMapping'
   40: q    BINPUT     0
   42: .    STOP
highest protocol among opcodes = 2
None
>>> pickletools.dis(conn._storage.load(z64)[0][43:])
    0: \x80 PROTO      3
    2: }    EMPTY_DICT
    3: q    BINPUT     1
    5: X    BINUNICODE 'data'
   14: q    BINPUT     2
   16: }    EMPTY_DICT
   17: q    BINPUT     3
   19: X    BINUNICODE 'obj'
   27: q    BINPUT     4
   29: C    SHORT_BINBYTES b'>.\xe6\x8c\x1a\xb9\xf8\xd3'
   39: q    BINPUT     5
   41: c    GLOBAL     'persistent Persistent'
   64: q    BINPUT     6
   66: \x86 TUPLE2
   67: q    BINPUT     7
   69: Q    BINPERSID
   70: s    SETITEM
   71: s    SETITEM
   72: .    STOP
   highest protocol among opcodes = 3

The important part here is the line labeled 29. Notice how that byte string matches the OID of our second persistent object. We can see that our root persistent object is referring to our other persistent object via its OID. The opcode labeled 69, BINPERSID is where ZODB hooks in to the unpickling process. By setting an attribute on the unpickler called "persistent_load", ZODB arranges to be called any time one of these OIDs is located in a pickle. ZODB can then go back to the cache or the storage to find the matching object.

Conflict Detection Through Serial Numbers

We've seen how ZODB can address objects, finding them and loading them by OID, automatically re-creating in memory the object graph we built. But that's not all ZODB does. ZODB provides the ability to modify and save objects, and to detect when more than one Connection attempts to do so at the same time, resulting in a conflict. How does this work?

To explore this, lets have both connections make a change to their version of the root object. We'll be using separate transaction managers, one for each connection, in order to do this.

>>> from transaction import TransactionManager
>>> conn1_txm = TransactionManager()
>>> conn2_txm = TransactionManager()
>>> conn.close()
None
>>> conn2.close()
None
>>> conn = db.open(conn1_txm)
>>> conn2 = db.open(conn2_txm)
>>> conn.root()['obj2'] = Persistent()
>>> conn2.root()['obj3'] = Persistent()

If we commit the changes from one transaction, everything goes as planned:

>>> conn1_txm.commit()
None

But trying to commit the other transaction fails because the object has already been modified in an incompatible way:

>>> conn2_txm.commit()
Traceback (most recent call last):
...
   raise ConflictError(oid=oid, serials=(committedSerial, oldSerial),
ZODB.POSException.ConflictError: database conflict error
    (oid 0x00,
    class persistent.mapping.PersistentMapping,
    serial this txn started with 0x03dfd117d4dbf099 2021-05-03 16:23:49.888861,
    serial currently committed 0x03dfd123995e91dd 2021-05-03 16:35:35.945956)

How did ZODB know the object had changed? In addition to keeping track of its OID, each persistent object also keeps track of its serial number in the _p_serial special attribute. When one connection tries to make a change to an object, the serial number of the object being changed is compared with the serial number ZODB has stored for the object, and if they're not equal, there's a conflict. (The Connection.readCurrent() API does the same thing for objects that haven't been modified.)

We can confirm this by matching the _p_serial of each connection's object with the error message reported above.

>>> conn2.root()._p_serial
b'\x03\xdf\xd1\x17\xd4\xdb\xf0\x99'
>>> hex(u64(conn2.root()._p_serial))
'0x3dfd117d4dbf099'
>>> conn.root()._p_serial
b'\x03\xdf\xd1#\x99^\x91\xdd'
>>> hex(u64(conn.root()._p_serial))
'0x3dfd123995e91dd'

That serial number is the transaction ID. This is sometimes abbreviated to TID.

Thus, to uniquely identify a particular revision of a particular object, we need both its OID and TID. (Together, they form the full "address" of the object.) The storage object has methods to load object states as they existed at particular serial numbers:

>>> db.storage.loadSerial(z64, conn2.root()._p_serial)
(b'\x80\x03cpersistent.mapping\nPersistentMapping\nq\x00.\x80\x03}q\x01'
 b'X\x04\x00\x00\x00dataq\x02}q\x03X\x03\x00\x00\x00objq\x04C\x08>.'
 b'\xe6\x8c\x1a\xb9\xf8\xd3q\x05cpersistent\nPersistent\nq\x06\x86q\x07Qss.')
>>> db.storage.loadSerial(z64, conn.root()._p_serial)
(b'\x80\x03cpersistent.mapping\nPersistentMapping\nq\x00.\x80\x03}q\x01'
 b'X\x04\x00\x00\x00dataq\x02}q\x03(X\x03\x00\x00\x00objq\x04C\x08>'
 b'.\xe6\x8c\x1a\xb9\xf8\xd3q\x05cpersistent\nPersistent\nq\x06\x86q\x07QX\x04'
 b'\x00\x00\x00obj2q\x08C\x08>.\xe6\x8c\x1a\xb9\xf8\xd4q\th\x06\x86q\nQus.')

Setting the Serial Number

You might be wondering when the _p_serial of a persistent object is set. There are two times. First, when an object has been added or modified during a transaction, one of the last things ZODB does is update the _p_serial. Here's part of tpc_finish (the last part of committing a transaction) from Connection:

def tpc_finish(self, transaction):
    """Indicate confirmation that the transaction is done.
    """
    serial = self._storage.tpc_finish(transaction)
    assert type(serial) is bytes, repr(serial)
    for oid_iterator in self._modified, self._creating:
        for oid in oid_iterator:
            obj = self._cache.get(oid)
            # Ignore missing objects and don't update ghosts.
            if obj is not None and obj._p_changed is not None:
                obj._p_changed = 0
                obj._p_serial = serial

Notice that it's the underlying storage that's responsible for allocating the transaction ID. Likewise, the underlying storage is responsible for allocating new OIDs when objects are first stored.

Properties of Transaction IDs

From the examples above, we've learned a few things about transaction IDs.

Like OIDs, TIDs are also 8-byte, or 64-bit, numbers (b'\x03\xdf\xd1\x17\xd4\xdb\xf0\x99').
They can be printed as those 8 bytes, but more often they are printed as a number (279171602205896857), usually in hexadecimal (0x3dfd117d4dbf099).
Most interestingly, transaction IDs can also be interpreted as points in UTC time (2021-05-03 16:23:49.888861).

The persistent.timestamp.TimeStamp class assists with parsing and formatting TIDs.

>>> from persistent.timestamp import TimeStamp
>>> ts = TimeStamp(b'\x03\xdf\xd1\x17\xd4\xdb\xf0\x99')
>>> print(ts)
2021-05-03 16:23:49.888861
>>> ts.timeTime()
1620059029.888861

Transaction IDs Are Based On the Current Time

When a transaction commits, ZODB assigns it a transaction ID based on the current time. This is part of tpc_begin (the first part of committing a transaction) from the BaseStorage class, a common base class for many storage implementations:

def tpc_begin(self, transaction, tid=None, status=' '):
    ...
    if tid is None:
        now = time.time()
        t = TimeStamp(*(time.gmtime(now)[:5] + (now % 60,)))
        ...

Transaction IDs Must Increase

Each transaction that commits gets a new transaction ID, based on the current time. Since time moves forward (later is always later than now), that suggests that subsequent transaction IDs will just get bigger and bigger. And normally that would be true. But suppose the computer's clock gets set back, or suppose the database file is moved from one machine whose clock was running way ahead to another machine whose clock is set to the right time. How do we guarantee that a value derived from time.time() is actually going to be larger than the last value?

BaseStorage and the TimeStamp class have us covered. BaseStorage keeps track of the last TID that's been committed, and TimeStamp has a laterThan method that guarantees that the returned value is larger than the previous value:

def tpc_begin(self, transaction, tid=None, status=' '):
    ...
    if tid is None:
        now = time.time()
        t = TimeStamp(*(time.gmtime(now)[:5] + (now % 60,)))
        self._ts = t.laterThan(self._ts)

Different Transaction IDs Can Have the Same Timestamp

But there's a gotcha here. Just because the value is larger doesn't mean that it will actually print as a different time. The conversion from 64-bit number to floating point number of seconds since the epoch to human-readable string is lossy. We can see this if we continue the example we started above.

>>> later = ts.laterThan(ts)
>>> later > ts
True
>>> ts
b'\x03\xdf\xd1\x17\xd4\xdb\xf0\x99'
>>> later
b'\x03\xdf\xd1\x17\xd4\xdb\xf0\x9a'
>>> print(later)
2021-05-03 16:23:49.888861
>>> print(ts)
2021-05-03 16:23:49.888861
>>> later.timeTime()
1620059029.888861
>>> ts.timeTime()
1620059029.888861

Even though the numeric value of later is exactly one bigger than that of ts, they still print the same and have the same timeTime() value. Empirically, up to 16 sequential TID values can have the same timeTime() when using the C implementation of TimeStamp; the Python implementation rounds slightly differently and up to 70 sequential TID values have printed the same timeTime() (both implementations print the date string the same for 70 sequential TIDs). The takeaway? Don't rely too heavily on the timeTime() or printed date string to compare or store TIDs. Prefer instead to use either the raw() 8-byte value or its 64-bit integer representation.

Uses of Transaction IDs

In addition to detecting conflicts, or loading particular revisions of objects, ZODB can make explicit use of TIDs in other ways. Usually, this is through their form as a timestamp. For example, the pack method of IStorage, used to perform garbage collection, takes a "pack time" as its first argument. This number of seconds is converted into a TID through the TimeStamp APIs by most storages. (But, because of the gotcha outlined above, this is slightly approximate or potentially ambiguous.)

Similarly, the open method of ZODB.DB can accept an optional before= or at= parameter to request a historical view of the database. These parameters can be a specific datetime.datetime object representing an approximate timestamp, or they can be an exact 8-byte TID.

Transaction IDs in RelStorage

Earlier, I showed the code that BaseStorage and FileStorage use to create a new TID based on the current time. Since FileStorage can only be used by a single process on a single computer at a time, it's easy to guarantee that the TID is always increasing.

RelStorage is a bit different. The underlying SQL database in RelStorage can be used by many different processes on many different computers at the same time. There's no guarantee that all these computers will have their clocks set exactly the same, so trying to use the local time from each of them to create timestamps can lead to confusion (TIDs might seem to have large gaps or otherwise jump around, or may not even be related to the actual commit time at all!).

To combat this, RelStorage computes the next TID on the database server itself (for supported databases, namely MySQL and PostgreSQL). That way, there's only one clock involved.

RelStorage's client-side pickle cache is highly integrated with TIDs, but that's another post.