At the 2ndQuadrant PostgreSQL conference in New York City this year, I had the pleasure of delivering a training on Postgres-XL together with Andrew Dunstan. We capped the attendees at 8 because we wanted to have a lot of lab time with the students and spend time helping them get set up with Postgres-XL. The class sold out. We had attendees not only from New York City but also from other parts of the United States and one from Europe. I was honored to have folks travel so far to take the course Andrew and I put together, based on work Pavan Deolasee had done previously. We received a lot of positive feedback on the class. Andrew put together a script that enabled all the students to operate on 3 separate 5 node Postgres-XL clusters.
Each of the students could of had their own 5 node
Postgres-XL (or just XL, for short) is an open source project from 2ndQuadrant. It is a massively parallel database built on top of PostgreSQL, and it is designed to be horizontally scalable and flexible enough to handle various workloads.
Here we will show how to build a test environment to play with XL and how to configure it using the OmniDB 2.2 web interface.
2. Building test environment
In this experiment, we will build a cluster with 1 GTM, 1 coordinator and 2 data nodes. It would be simpler to put them in the same virtual machine, however split them across multiple virtual machines is a more realistic scenario. So we will build 3 virtual machines:
GTM and coordinator
We expect Postgres-XL 10 to be released in the next few months, with the new features of partitioning and logical replication. You’ll be able to load Postgres-XL directly from PostgreSQL.
For the first time, you’ll be able to run massively parallel queries both across the datanodes and within the datanodes to give huge performance gains.
Earlier today, we updated the master branch of the Postgres-XL repository to include all commits from PostgreSQL's master branch up to June 26. That means the XL project is now fully to date with the PostgreSQL source code, meaning there is now only minimal lag between PostgreSQL and Postgres-XL.
At this point the code is only in Development/Alpha, though we expect to produce Beta1 soon after PostgreSQL Beta 3 in August.
Support from the
(look at the “master” branch for now).
There’s still quite a bit of work to be done - merging a few remaining bits from upstream, fixing known bugs and regression failures, testing, etc. If you’re considering contributing to Postgres-XL, this is an ideal opportunity (send me
an e-mail and I’ll help you with the first steps).
But overall, Postgres-XL 9.6 is clearly a major step forward in a number of important areas.
For the last few months, we at 2ndQuadrant have been working on merging PostgreSQL 9.6 into Postgres-XL, which turned out to be quite challenging for various reasons, and took more time than initially planned due to several invasive upstream changes. If you’re interested, look at the official repository
In my last blog, we looked at the benchmark results from bulk load test for a Postgres-XL database cluster. Using a 16-datanode, 2-coordinator cluster, running on EC2 instances, we could easily clock 9M rows/sec or 3TB/hr of ingestion rate. That’s a significant number in itself. In this blog, we’ll see if the ingestion rate is scalable in Postgres-XL. In particular, we’ll try to answer if adding more nodes to the cluster can result in a linear increase in performance.
Let’s use the same line item table from the TPC-H benchmark for these tests. We'll increase the cluster size from 16 datanodes to 20 datanodes and then further to 24 datanodes. We'll also repeat the tests with 1, 2 and 3 coordinators respectively. For all these tests, we are using i2.xlarge EC2 instance for a
We are faced with this question: “What’s the ingestion rate of Postgres-XL?”, and I realised I don’t have a very good answer to that. Since recently we made some good improvements in this area, I was curious to know too. Well, I decided to benchmark.
Hardware and Software
For the tests, I used a Postgres-XL cluster running on EC2 instances. Since COPY has to go through the coordinator, it seemed reasonable to use a compute-optimised c3.8xlarge instance for running coordinator. Similarly, for datanodes, storage-optimised i2.xlarge instances are more appropriate. Both these instances have attached SSD disks, though i2.xlarge instance has more storage than the c3.8xlarge instance.
So the next question was how to generate data for the test? We’d used TPC-H benchmark for
The Postgres-XL community has released a new version of the software, tagged as 9.5r1.1. This is a minor update to earlier 9.5r1 release, but contains some very important bug fixes. I would encourage everyone to immediately upgrade to this latest release. Being a minor release, you only need to download the latest sources from , compile, install new binaries and restart your servers.
This release is fully caught up with the latest PostgreSQL 9.5 minor release i.e. it includes all bug fixes from PostgreSQL 9.5.3 version, also released today. List of important bug fixes in PostgreSQL 9.5.3 can be found here. Apart from that, it contains following important bug fixes:
Fix a nasty bug in CLOG and Subtrans Log management which may lead to severe data corruption.
After months of efforts, I'm pleased that Postgres-XL 9.5r1 is seeing the daylight. It has been tremendous collective efforts by many, both inside and outside 2ndQuadrant. Often it's not visible via commit history or mailing list communications, but I must admit that many folks have contributed in making this grand release. Contributors who wrote code and sent ideas, those who reviewed the code, QA team for testing and enhancing regression coverage, those who did benchmarks to show how we are doing, those who wrote automation to deploy the product and of course the user community for testing the product and providing invaluable feedback. So thank you folks!
Over the year, I've periodically wrote about various enhancements we made to the product. But let me summarise some of them here
The Postgres-XL 9.5R1Beta2 release went out yesterday. It's another step forward to have a stable 9.5 release sometime very soon. A few key enhancements from the last beta release are captured in this blog. For the full list, I would recommend to read the release notes.
Support for binary data transfer for JDBC and libpq
If you'd trouble receiving data in binary format via JDBC or libpq, this release should fix those issues for you. The coordinator now have intelligence to figure out if the client is requesting data in binary format and handle those cases correctly.
Pushdown of Append and MergeAppend plans
One of the problems reported with the beta1 release was that for inherited tables, coordinator fails to pushdown plans to the remote nodes even when its possible. A simple example