Sharding: Bringing back Postgres-XL technology into core PostgreSQL
Sharding or horizontal scalability is a popular topic, discussed widely on PostgreSQL mailing lists these days. When we started the Postgres-XC project back in 2010, not everyone was convinced that we need a multi-node PostgreSQL cluster that scales with increasing demand. Even more likely, we, the PostgreSQL community, were skeptical about whether we have enough resources to design and implement a complex sharding architecture. Also, there was so much work that we could have done for vertical scalability and the obvious choice for many was to focus on the vertical scalability aspects of performance.
With great work that Robert Haas, Andres Freund and others have done in the last few releases, law of marginal utility is now catching up with the vertical scalability enhancements. Now, almost 6 years later, it’s becoming quite clear that horizontal scalability is a must to meet the demand of the users who very much like to use RDBMS for managing and querying their terabytes of data. Fortunately, Postgres-XC and Postgres-XL community have already proved the merits of a sharding solution based on the chosen design. These products have also matured over the last few years and are ready for public consumption. Of course, in the long run we all, including the Postgres-XL community, would like to push most of these features back into core PostgreSQL. But this is a multi-year, multi-release effort and until such time, the Postgres-XL community is committed to support, maintain and even enhance the product.
Here is a short list of features, in no particular order, that IMHO any sharding or horizontal scalability solution should support.
- Transparent and efficient placement of data for query optimisation
- On-demand addition and removal of nodes
- Scalabale components
- Guaranteeing transaction ACID properties all the time on all the nodes
- Parallel execution of queries on nodes
- Offloading execution to the remote nodes
- Connection pooling between various nodes
- UI to configure, monitor and manage the shards and other components
- High availability
- Node membership
Anything else?
A few of these features have already made it to core PostgreSQL, may be in a different form. Some of them may have been inspired by the work that was previously done as part of the Postgres-XC or Postgres-XL projects. But there are a whole bunch of things that we need to work on and get in a committable shape. I’m sincerely hoping that we start working in that direction sooner than later, leveraging the knowledge and the technology developed as part various sharding solutions.
I suggest a new data type concatenating (timestamp + UUID) used for independent key generation on different nodes. MongoDB (and others) use this approach because it provides naturally sorting, unique keys without needing a coordinator.
I also suggest you relax the 100% ACID compliance requirements, or at least make it configurable. Per the CAP theorem, that requirement means that no transaction can be commited if any node is unavailable.
Finally, connection pooling between nodes sounds like you have already commited to an architecture in which the client connects to each node vs. an architecture in which a client connects to a query-coordinator, which connects to each node and aggregates the results. I have no idea which is better, but it seems like that should be an explicit decision rather than an implied one.
Nice work! Cheers.
I have to agree re “Guaranteeing transaction ACID properties all the time on all the nodes”. That’s very useful and very important for transparently moving an application from a single node to a horizontally scaled cluster, but it has limits and won’t meet all needs. XL is a fairly tightly coupled cluster system that favours transparently “just working” at the cost of latency- and partition-tolerance.
That’s why BDR implements a different strategy, with asynchronous multi-master replication. It doesn’t have the other pieces of the puzzle though – pooling, cross-node query execution, sharding, etc – and there are some complexities when it comes to doing those correctly in an asynchronous, loosely coupled system.
There’s value to both approaches, but XL meets the great majority of needs. Despite having been heavily involved in BDR development I spend quite a bit of time trying to convince people they don’t need BDR or shouldn’t use it because they just see “multi-master, we need that”. They don’t consider the complexities of asynchronous multi-master replication when it comes to consistency, lost-update issues, etc. People tend to assume they can just point their app at multiple nodes and expect things to work. With BDR that’s very far from the case, apps need to be very aware of the replication/clustering platform and behave differently to how they would on a standalone server.
With XL what you expect is what you get – it “just works”. It’s what most people who need scale-out should be looking at using.
“Per the CAP theorem, that requirement means that no transaction can be commited if any node is unavailable.”
XL has High Availability so there is no single point of failure, so CAP doesn’t cut in easily.
“Finally, connection pooling between nodes sounds like you have already commited to an architecture in which the client connects to each node vs. an architecture in which a client connects to a query-coordinator, which connects to each node and aggregates the results. I have no idea which is better, but it seems like that should be an explicit decision rather than an implied one.”
You’ve interpreted the architecture incorrectly, it doesn’t work like that. Also, the architecture is based upon explicit decisions, this wasn’t hacked together quickly, its a trusted design.
“XL has High Availability so there is no single point of failure, so CAP doesn’t cut in easily.”
Can you explain this? Because you’re claiming both consistency and availability which is definitely within the domain of the CAP theorem.