Last year I wrote about a benchmark which I performed on the Parallel Aggregate feature that I worked on for PostgreSQL 9.6. I was pretty excited to see this code finally ship in September last year, however something stood out on the release announcement that I didn’t quite understand:
Scale Up with Parallel Query
Version 9.6 adds support for parallelizing some query operations, enabling utilization of several or all of the cores on a server to return query results faster. This release includes parallel sequential (table) scan, aggregation, and joins. Depending on details and available cores, parallelism can speed up big data queries by as much as 32 times faster.
It was the “as much as 32 times faster” that I was confused at. I saw no reason for this limit. Sure, if you
The PostgreSQL 9.6 Release
It feels like only just a few months ago that we were celebrating the release of PostgreSQL 9.5.0, but already we’re very close to the 9.6.0 release! For me personally, I’m very excited about this particular release of PostgreSQL. It was just 5 or 6 months ago that I was busy refectoring
A small peek into the future of what should be arriving for PostgreSQL 9.6.
Today PostgreSQL took a big step ahead in the data warehouse world and we now are able to perform aggregation in parallel using multiple worker processes! This is great news for those of you who are running large aggregate queries over 10's of millions or even billions of records, as the workload can now be divided up and shared between worker processes seamlessly.
We performed some tests on a 4 CPU 64 core server with 256GB of RAM using TPC-H @ 100 GB scale on query 1. This query performs some complex aggregation on just over 600 million records and produces 4 output rows.
The base time for this query without parallel aggregates (max_parallel_degree = 0) is 1375 seconds. If we add a single worker (