One of the problems with Postgres-XL 9.2 is that it assigns a global transaction identifier (GXID) for every transaction that is started in the cluster. Since its hard for the system to know if a transaction is read-only or not (remember, even SELECT queries can do write activities), Postgres-XL would always assign a GXID and send that to the datanodes. This is quite bad for system performance because every read-only transaction now must go to the GTM, start a new transaction, grab an GXID and also finish the transaction on the GTM when its committed or aborted. For short transactions, like read-only pgbench workload, this adds a lot of overhead and latency. The overhead is even more noticeable when the actual work done by the transaction is very fast, say because the data is fully cached in shared buffers.
This is now set to change in the upcoming Postgres-XL 9.5 release. We have implemented the same lazy GXID assignment technique similar to PostgreSQL. This required us to enhance the concept of global session. A transaction which involves more than one node runs a global session and gets a unique global session identifier. If a node while executing the query decides to assign GXID to the transaction, it sends the global session identifier to the GTM. If the global transaction, identified by the global session, had already been assigned a GXID, the GTM sends back the same GXID to the node or a new GXID is assigned to the transaction. Further, the node sends back the assigned GXID to the coordinator so that it can decide whether to run a single phase or two-phase commit for the transaction.
While the lazy GXID assignment resulted in significant reduction in the GXID consumption rate, its still not enough because of yet another problem that we must solve. As you’re aware, Postgres-XL uses Multi-version Concurrency Control or MVCC for providing various transaction guarantees. But to reclaim the dead row versions, every node must check row versions against something called as RecentGlobalXmin which, in simple terms, is the oldest transaction that might be visible to the oldest transaction running in the system. To correctly compute this value, the GTM must be aware of the snapshot used by every running transaction in the cluster, even read-only transactions. That implies that every transaction must still report its start and end to the GTM, along with the oldest transaction its still interested in (known as snapshot xmin). If we wanted to do any significant improvement for read-only, short transaction workload, we must address this problem too.
So what we now have is a new cluster-monitor process that every datanode and coordinator runs as a postmaster child process. This process is responsible for periodically reporting node-local state to the GTM, which then computes the global state and sends back to all the nodes. To address the RecentGlobalXmin described above, the cluster-monitor sends the RecentGlobalXmin computed using node-local information. The GTM then computes a cluster-wide RecentGlobalXmin which node must use. The entire algorithm is much more complex than it sounds because we must account for node failures, node disconnections and ensure that the RecenGlobalXmin computation does not unnecessarily stalled because that will prevent any dead row versions getting removed from the system.
To summarise, the old way of executing a read-only transaction suffered from the following problems that the new Postgres-XL 9.5 will solve:
These features together give us an edge over other MPP databases as far as short, read-only transactions are concerned. Stay tuned for my next blog on OLTP performance benchmark results using these new enhancements.