During the PostgreSQL 11 development cycle an impressive amount of work was done to improve table partitioning. Table partitioning is a feature that has existed in PostgreSQL for quite a long time, but it really wasn't until version 10 that it started to become a highly useful feature. We'd previously claimed that table inheritance was our implementation of partitioning, which was true. It just left you to do much of the work yourself manually. For example, during INSERTs, if you wanted the tuples to make it to your partitions then you had to set up triggers to do that for you. Inheritance partitioning was also slow and hard to develop additional features on top of.
In PostgreSQL 10 we saw the birth of "Declarative Partitioning", a feature which is designed to solve many of the
The feature freeze for the PostgreSQL 11 release is now upon us. During the last few days my colleague Álvaro Herrera pushed two changes into the development branch of PostgreSQL:
1. Faster Partition Pruning
2. Partition Pruning at Execution Time
These patches aim to improve the performance and usability of the declarative table partitioning feature (added in PostgreSQL 10). Amit Langote wrote the first of these two patches, with some assistance from me. I'm the author of the second patch. This one is based on an original patch by Beena Emerson.
Internally in PostgreSQL, a partitioned table is made up from a series of individual tables. These tables are all grouped under one common parent partitioned table. Queries being run against the partitioned table need the
Last year I wrote about a benchmark which I performed on the Parallel Aggregate feature that I worked on for PostgreSQL 9.6. I was pretty excited to see this code finally ship in September last year, however something stood out on the release announcement that I didn’t quite understand:
Scale Up with Parallel Query
Version 9.6 adds support for parallelizing some query operations, enabling utilization of several or all of the cores on a server to return query results faster. This release includes parallel sequential (table) scan, aggregation, and joins. Depending on details and available cores, parallelism can speed up big data queries by as much as 32 times faster.
It was the “as much as 32 times faster” that I was confused at. I saw no reason for this limit. Sure, if you
The PostgreSQL 9.6 Release
It feels like only just a few months ago that we were celebrating the release of PostgreSQL 9.5.0, but already we’re very close to the 9.6.0 release! For me personally, I’m very excited about this particular release of PostgreSQL. It was just 5 or 6 months ago that I was busy refectoring
A small peek into the future of what should be arriving for PostgreSQL 9.6.
Today PostgreSQL took a big step ahead in the data warehouse world and we now are able to perform aggregation in parallel using multiple worker processes! This is great news for those of you who are running large aggregate queries over 10's of millions or even billions of records, as the workload can now be divided up and shared between worker processes seamlessly.
We performed some tests on a 4 CPU 64 core server with 256GB of RAM using TPC-H @ 100 GB scale on query 1. This query performs some complex aggregation on just over 600 million records and produces 4 output rows.
The base time for this query without parallel aggregates (max_parallel_degree = 0) is 1375 seconds. If we add a single worker (