Last year I wrote about a benchmark which I performed on the Parallel Aggregate feature that I worked on for PostgreSQL 9.6. I was pretty excited to see this code finally ship in September last year, however something stood out on the release announcement that I didn’t quite understand: Scale Up with Parallel QueryVersion 9.6 adds support for parallelizing some query operations, enabling utilization of several or all of the cores on a server to return query results faster. This release includes parallel sequential (table) scan, aggregation, and joins. Depending on details and available cores, parallelism can speed up big data queries by as much as 32 times faster. It was the “as much as 32 times faster” that I was confused at. I saw no reason for this limit. Sure, if you have 32 CPU
On systems doing non-trivial number of writes, tuning checkpoints is crucial for getting good performance. Yet checkpoints are one of the areas where we often identify confusion and configuration issues, both on the community mailing lists and during performance tuning reviews for our customers. (The other one being autovacuum, discussed a few days ago by Joe Nelson from Citus.) So let me walk you through the checkpoints - what they do and how to tune them in PostgreSQL.(more…)
In my last blog, we looked at the benchmark results from bulk load test for a Postgres-XL database cluster. Using a 16-datanode, 2-coordinator cluster, running on EC2 instances, we could easily clock 9M rows/sec or 3TB/hr of ingestion rate. That’s a significant number in itself. In this blog, we’ll see if the ingestion rate is scalable in Postgres-XL. In particular, we’ll try to answer if adding more nodes to the cluster can result in a linear increase in performance.Let’s use the same line item table from the TPC-H benchmark for these tests. We'll increase the cluster size from 16 datanodes to 20 datanodes and then further to 24 datanodes. We'll also repeat the tests with 1, 2 and 3 coordinators respectively. For all these tests, we are using i2.xlarge EC2 instance for a datanode an
We are faced with this question: “What’s the ingestion rate of Postgres-XL?”, and I realised I don’t have a very good answer to that. Since recently we made some good improvements in this area, I was curious to know too. Well, I decided to benchmark. Hardware and Software For the tests, I used a Postgres-XL cluster running on EC2 instances. Since COPY has to go through the coordinator, it seemed reasonable to use a compute-optimised c3.8xlarge instance for running coordinator. Similarly, for datanodes, storage-optimised i2.xlarge instances are more appropriate. Both these instances have attached SSD disks, though i2.xlarge instance has more storage than the c3.8xlarge instance. So the next question was how to generate data for the test? We’d used TPC-H benchmark for previous perform
PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the fourth post of the series and we’ll talk about synchronous commit and its effects on fault tolerance and dependability of PostgreSQL.If you would like to witness the evolution progress from the beginning, please check the first three blog posts of the series below. Each post is independent, so you don't actually need to read one to understand another.Evolution of Fault Tolerance in PostgreSQL Evolution of Fault Tolerance in PostgreSQL: Replication Phase Evolution of Fault Tolerance in PostgreSQL: Time TravelSynchronous CommitBy default, PostgreSQL implements
I've published multiple benchmarks comparing different PostgreSQL versions, as for example the performance archaeology talk (evaluating PostgreSQL 7.4 up to 9.4), and all those benchmark assumed fixed environment (hardware, kernel, ...). Which is fine in many cases (e.g. when evaluating performance impact of a patch), but on production those things do change over time - you get hardware upgrades and from time to time you get an update with a new kernel version.For hardware upgrades (better storage, more RAM, faster CPUs, ...), the impact is usually fairly easy to predict, and moreover people generally realize they need to assess the impact by analyzing the bottlenecks on production and perhaps even testing the new hardware first.But for what about kernel updates? Sadly we usually d...
PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the third post of the series and we’ll talk about timeline issues and their effects on fault tolerance and dependability of PostgreSQL.If you would like to witness the evolution progress from the beginning, please check the first two blog posts of the series:Evolution of Fault Tolerance in PostgreSQL Evolution of Fault Tolerance in PostgreSQL: Replication Phase Timelines The ability to restore the database to a previous point in time creates some complexities which we’ll cover some of the cases by explaining failover (Fig. 1), switchover (Fig. 2) and pg_rewind (Fig. 3) case
I had the pleasure to attend PGDay UK last week - a very nice event, hopefully I'll have the chance to come back next year. There was plenty of interesting talks, but the one that caught my attention in particular was Performace for queries with grouping by Alexey Bashtanov.I have given a fair number of similar performance-oriented talks in the past, so I know how difficult it is to present benchmark results in a comprehensible and interesting way, and Alexey did a pretty good job, I think. So if you deal with data aggregation (i.e. BI, analytics, or similar workloads) I recommend going through the slides and if you get a chance to attend the talk on some other conference, I highly recommend doing so.(more…)
For a long time, one of the most known shortcomings of PostgreSQL was the ability to parallelise queries. With the release of version 9.6, this will no longer be an issue. A great job has been done on this subject, starting from the commit 80558c1, the introduction of parallel sequential scan, which we will see in the course of this article.First, you must take note: the development of this feature has been continuous and some parameters have changed names between a commit and another. This article has been written using a checkout taken on June 17 and some features here illustrated will be present only in the version 9.6 beta2.Compared to the 9.5 release, new parameters have been introduced inside the configuration file. These are:max_parallel_workers_per_gather: the ...
PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the second post of the series and we'll talk about replication and its importance on fault tolerance and dependability of PostgreSQL.If you would like to witness the evolution progress from the beginning, please check the first blog post of the series: Evolution of Fault Tolerance in PostgreSQLPostgreSQL Replication Database replication is the term we use to describe the technology used to maintain a copy of a set of data on a remote system. Keeping a reliable copy of a running system is one of the biggest concerns of redundancy and we all like maintainable, easy-to-use and stable