The video of my presentation below walks you through 'Indexing in PostgreSQL' - a key component to database performance.
This presentation covers the following topics:
Various ways to access data in PostgreSQL
Different types of indexes supported by PostgreSQL
Internals of BTree and BRIN indexes
Overview of GIN and GiST indexes
How to find missing indexes
How to find unused indexes
We expect Postgres-XL 10 to be released in the next few months, with the new features of partitioning and logical replication. You’ll be able to load Postgres-XL directly from PostgreSQL.
For the first time, you’ll be able to run massively parallel queries both across the datanodes and within the datanodes to give huge performance gains.
Earlier today, we updated the master branch of the Postgres-XL repository to include all commits from PostgreSQL's master branch up to June 26. That means the XL project is now fully to date with the PostgreSQL source code, meaning there is now only minimal lag between PostgreSQL and Postgres-XL.
At this point the code is only in Development/Alpha, though we expect to produce Beta1 soon after PostgreSQL Beta 3 in August.
Support from the (more…)
Yet another edition of PGConf India came to conclusion in early March. You may have noticed the change from PGDay to PGConf, which signals a much larger gathering of PostgreSQL enthusiasts, now and in future. What started as a small meet-up of like minded people 4 years back, has now grown into a 2-day conference with a dedicated training day and a multi-track event at the main conference.
The keynote this year was delivered by Simon Riggs, a major developer and committer at PostgreSQL. He presented his thoughts on why Persistence is key to PostgreSQL's success. Persistence is why users trust PostgreSQL to manage their critical data and persistence is why PostgreSQL community is able to deliver a solid product, release after release.
This year's conference was attended by more (more…)
It has taken a lot of planning and efforts, but I'm happy to see that PGConf India 2017 is coming along very well and promises to be a great event. This is our third year in a row in Bengaluru, but there are many distinguishing factors that make this year stand out:
For the first time, we're having a dedicated training day ahead of the main conference. All seats for the trainings are sold out and we had to turn away many interested folks.
For the first time, we're having a multi-track conference.
And for the first time, we've crossed 225 conference delegate registrations, and there are still a few days left.
When we started planning for the conference, we were a bit nervous about whether we could get a good response to the training programme, but we’re almost stumped by (more…)
Two weeks back we discovered an ancient bug in PostgreSQL which may cause index corruption when index is built via CREATE INDEX CONCURRENTLY (aka CIC). All supported releases of PostgreSQL are affected by this bug. The bug could be a source of index corruption reports that we receive from the field once in awhile.
Now before we discuss the problem, let me make it absolutely clear that PostgreSQL is known for its gold standards as far as code quality and product stability is concerned. But just like any other software, the database engine can also have unnoticed bugs. But PostgreSQL community takes every known bug very seriously and try to fix it on priority. No wonder that the bug fix went into the scheduled 9.6.2 release (and all other stable releases) at a very short notice, but not (more…)
This technical blog explains how CREATE INDEX CONCURRENTLY (CIC) works and how it manages to avoid locking the table from updates. A unique distinguishing factor of CIC is that it can build a new index on the table, without blocking it from updates/inserts/deletes.
But even before that, let's understand how Heap-Only-Tuple (HOT) works. It was a landmark feature added in PostgreSQL 8.3 to reduce table bloat and improve performance significantly. But the feature also has some implications on the working of CIC.
PostgreSQL uses multi-version concurrency control (MVCC) for transactions. In this technique, when a row is updated, a new version of the row is created and the old version is left unchanged. Each version has creator and destroyer information and transaction (more…)
In my last blog, we looked at the benchmark results from bulk load test for a Postgres-XL database cluster. Using a 16-datanode, 2-coordinator cluster, running on EC2 instances, we could easily clock 9M rows/sec or 3TB/hr of ingestion rate. That’s a significant number in itself. In this blog, we’ll see if the ingestion rate is scalable in Postgres-XL. In particular, we’ll try to answer if adding more nodes to the cluster can result in a linear increase in performance.
Let’s use the same line item table from the TPC-H benchmark for these tests. We'll increase the cluster size from 16 datanodes to 20 datanodes and then further to 24 datanodes. We'll also repeat the tests with 1, 2 and 3 coordinators respectively. For all these tests, we are using i2.xlarge EC2 instance for a (more…)
We are faced with this question: “What’s the ingestion rate of Postgres-XL?”, and I realised I don’t have a very good answer to that. Since recently we made some good improvements in this area, I was curious to know too. Well, I decided to benchmark.
Hardware and Software
For the tests, I used a Postgres-XL cluster running on EC2 instances. Since COPY has to go through the coordinator, it seemed reasonable to use a compute-optimised c3.8xlarge instance for running coordinator. Similarly, for datanodes, storage-optimised i2.xlarge instances are more appropriate. Both these instances have attached SSD disks, though i2.xlarge instance has more storage than the c3.8xlarge instance.
So the next question was how to generate data for the test? We’d used TPC-H benchmark for (more…)
With PostgreSQL 9.6 now out, many of our users and customers are asking us about the plans for Postgres-XL 9.6. The new release of PostgreSQL implements some new features, especially around parallel query processing and many are interested to know when those capabilities will be available in Postgres-XL.
Postgres-XL itself is a massively parallel processing system. It has an efficient distributed planner and executor which can distribute work to tens of datanodes, process the result and send it back to the client. It allows the user to either shard or replicate a table for efficient query execution. On top of that, Postgres-XL also provides global consistency of the data, all the time, so that users don’t need to worry about consistency while designing their applications.
The Postgres-XL community has released a new version of the software, tagged as 9.5r1.1. This is a minor update to earlier 9.5r1 release, but contains some very important bug fixes. I would encourage everyone to immediately upgrade to this latest release. Being a minor release, you only need to download the latest sources from , compile, install new binaries and restart your servers.
This release is fully caught up with the latest PostgreSQL 9.5 minor release i.e. it includes all bug fixes from PostgreSQL 9.5.3 version, also released today. List of important bug fixes in PostgreSQL 9.5.3 can be found here. Apart from that, it contains following important bug fixes:
Fix a nasty bug in CLOG and Subtrans Log management which may lead to severe data corruption.
Add a (more…)