Tuesday, October 24

Transaction traceability in PostgreSQL 10 with txid_status(…)

One feature quietly added to PostgreSQL 10 is the ability to determine the commit status of any transaction by transaction-id.

It’s reasonable to wonder why you’d want this, since you know if you committed the transaction, it’s still in progress, or if you or rolled it back. And you can check for in-progress transactions in pg_stat_activity.

It exists to help the application recover to a known state after a failure without having to use heavyweight two-phase commit. It’s also useful for querying standbys.


Imagine that your application has just sent the COMMIT for a transaction that’s part of a queue processing system. Before the application receives a reply to its commit request, the database connection breaks due to network issues, a database crash, etc. It’s possible that the transaction could be committed, but no reply reached the client. But it’s also possible for the transaction to have rolled back automatically if the connection failed before commit was received by the server. We have no way to know if the server received and acted on the commit request or not. So the application doesn’t know whether to mark the queue job as completed or whether to retry it.


One solution is to use two-phase commit (2PC). With 2PC, the app gets a promise that the transaction is in persistent storage on the DB before it sends a final commit. This lets it check the two-phase commit status in pg_twophase and use that for crash recovery.

But 2PC is pretty heavy-weight; it adds extra database round-trips, extra fsync()s, etc.

With txid_status() support in PostgreSQL 10, the application can now locally record the transaction-id assigned to the transaction when it starts a write transaction. It can use the xid to look up the transaction’s completion status if it loses its connection or if the DB server crashes. Based on the result it can decide whether or not to repeat a given item of work.

If the application wants to determine the state of transactions that were interrupted by application crashes too, it needs to store transaction IDs in locally persistent storage. For example, it might write to a simple write-intent journal file “I am about to commit queue item 42 in postgres transaction 34531”. Then “I successfully committed queue item 42 in postgres transaction 34531”. On restart it can read through its write-intent journal, find any entries that weren’t completed, and ask PostgreSQL what the outcome was. This is effectively a lightweight transaction resolver, as is used in XA, just much simpler.

You’re probably wondering why the app doesn’t just write a “queue item 42 completed” entry to a table in postgres along with the work done by the queue. Often, it can do just that and then query the table to find out whether or not an item completed if it’s not sure. But in some cases the application doesn’t have full control over the schema. Or the work it’s doing doesn’t lend its self to being journaled within the database its self – particularly when it’s lots of small performance sensitive xacts and the app doesn’t want to add more load to the DB.

Querying standbys

txid_status() can also be used to query a standby and find out if it’s replayed up to a certain transaction. An app may wish to delay queries on a standby until it’s replayed up to a certain point, and it can do so by looping on txid_status().

Most applications currently find it sufficient to query pg_current_wal_insert_lsn() (was pg_current_xlog_insert_location() on 9.6 and older) after a critical transaction. Then wait until the standby replays past that point before re-enabling the standby for queries. The wait can be accomplished on the master side with monitoring of pg_stat_replication, or on the standby side by querying pg_last_wal_replay_lsn (was pg_last_xlog_replay_location).

This can also be achieved using commit timestamps if you’re on a system with fine enough timestamp resolution and don’t care about the tiny possibility of collisions. Use pg_last_committed_xact on the master, then wait until the result on the standby passes the same point. You need to send the pg_last_committed_xact after your COMMIT though, so there’s an extra round trip there.

So by its self, txid_status() isn’t that exciting for standby consistency. But…

Future work

I’d like to add a txid_wait_for_commit(bigint) function that blocks until the supplied xid is known to be committed/aborted. On commit, it returns, and if the xact is aborted, it ERROR’s instead. This would let apps achieve consistent query on their standbys by prefixing all their queries with a txid_wait_for_commit on the last-committed xid of the master.

I would also like to teach PostgreSQL to automatically send the transaction-id to the client when it’s assigned, so the client receives it along with the reply to the first write statement on a transaction. This would greatly simplify application use of transaction traceability both for recovery and for standby consistency.

Right now, PostgreSQL doesn’t send the transaction ID to the client as soon as it’s assigned, so the application must currently use txid_current() to ask for it. This can add a round-trip, but you can get rid of that by bundling txid_current() with your BEGIN query – send BEGIN; txid_current(); for read/write transactions. You should avoid calling txid_current() for read-only transactions, since you’ll increase the rate you use up transaction IDs and add to the workload of VACUUM.

Similarly, PostgreSQL should send the LSN of a commit along with the commit confirmation reply. This would let pooling systems etc intelligently route queries to only caught-up standbys, or wait for them to replay up to a certain point.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *