Sunday, April 22

Tag: performance

PostgreSQL Meltdown Benchmarks

PostgreSQL Meltdown Benchmarks

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
Two serious security vulnerabilities (code named Meltdown and Spectre) were revealed a couple of weeks ago. Initial tests suggested the performance impact of mitigations (added in the kernel) might be up to ~30% for some workloads, depending on the syscall rate. Those early estimates had to be done quickly, and so were based on limited amounts of testing. Furthermore, the in-kernel fixes evolved and improved over time, and we now also got retpoline which should address Spectre v2. This post presents data from more thorough tests, hopefully providing more reliable estimates for typical PostgreSQL workloads. (more…)

PostgreSQL Meltdown

Simon's PlanetPostgreSQL
Spectre and Meltdown have caused severe alarm in recent days. You may have read about up to 30% impact on PostgreSQL databases, which I believe to be overstated because of misunderstandings in the media. Let's dig into this in more detail. TL;DR Summary: no PostgreSQL patch required, -7% performance hit In response to these new security threats various OS patches have been released. Various authors have published benchmarks around these and they have, in some cases, stated worst-case measurements as impact measurements. For example: stating a 30% hit when, in fact, we are seeing a 7% hit on a busy server. Regrettably, it looks to me like some people outside the PostgreSQL community have spread this news as a problem for PostgreSQL, without clearly stating the workload measured, or (more…)

Benchmark on a Parallel Processing Monster!

David's PlanetPostgreSQL
Last year I wrote about a benchmark which I performed on the Parallel Aggregate feature that I worked on for PostgreSQL 9.6.  I was pretty excited to see this code finally ship in September last year, however something stood out on the release announcement that I didn’t quite understand: Scale Up with Parallel Query Version 9.6 adds support for parallelizing some query operations, enabling utilization of several or all of the cores on a server to return query results faster. This release includes parallel sequential (table) scan, aggregation, and joins. Depending on details and available cores, parallelism can speed up big data queries by as much as 32 times faster. It was the “as much as 32 times faster” that I was confused at. I saw no reason for this limit. Sure, if you (more…)

Basics of Tuning Checkpoints

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
On systems doing non-trivial number of writes, tuning checkpoints is crucial for getting good performance. Yet checkpoints are one of the areas where we often identify confusion and configuration issues, both on the community mailing lists and during performance tuning reviews for our customers. (The other one being autovacuum, discussed a few days ago by Joe Nelson from Citus.) So let me walk you through the checkpoints - what they do and how to tune them in PostgreSQL. (more…)

Postgres-XL Scalability for Loading Data

Pavan's PlanetPostgreSQL
In my last blog, we looked at the benchmark results from bulk load test for a Postgres-XL database cluster. Using a 16-datanode, 2-coordinator cluster, running on EC2 instances, we could easily clock 9M rows/sec or 3TB/hr of ingestion rate. That’s a significant number in itself. In this blog, we’ll see if the ingestion rate is scalable in Postgres-XL. In particular, we’ll try to answer if adding more nodes to the cluster can result in a linear increase in performance. Let’s use the same line item table from the TPC-H benchmark for these tests. We'll increase the cluster size from 16 datanodes to 20 datanodes and then further to 24 datanodes. We'll also repeat the tests with 1, 2 and 3 coordinators respectively. For all these tests, we are using i2.xlarge EC2 instance for a (more…)

Load data in Postgres-XL at over 9M rows/sec

2ndQuadrant, Pavan's PlanetPostgreSQL
We are faced with this question: “What’s the ingestion rate of Postgres-XL?”, and I realised I don’t have a very good answer to that. Since recently we made some good improvements in this area, I was curious to know too. Well, I decided to benchmark. Hardware and Software For the tests, I used a Postgres-XL cluster running on EC2 instances. Since COPY has to go through the coordinator, it seemed reasonable to use a compute-optimised c3.8xlarge instance for running coordinator. Similarly, for datanodes,  storage-optimised i2.xlarge instances are more appropriate. Both these instances have attached SSD disks, though i2.xlarge instance has more storage than the c3.8xlarge instance.  So the next question was how to generate data for the test? We’d used TPC-H benchmark for (more…)
Evolution of Fault Tolerance in PostgreSQL: Synchronous Commit

Evolution of Fault Tolerance in PostgreSQL: Synchronous Commit

2ndQuadrant, Featured, Gulcin's PlanetPostgreSQL, PostgreSQL
PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the fourth post of the series and we’ll talk about synchronous commit and its effects on fault tolerance and dependability of PostgreSQL. If you would like to witness the evolution progress from the beginning, please check the first three blog posts of the series below. Each post is independent, so you don't actually need to read one to understand another. Evolution of Fault Tolerance in PostgreSQL  Evolution of Fault Tolerance in PostgreSQL: Replication Phase  Evolution of Fault Tolerance in PostgreSQL: Time Travel Synchronous Commit By default, PostgreSQL (more…)
PostgreSQL vs. Linux kernel versions

PostgreSQL vs. Linux kernel versions

2ndQuadrant, Featured, PostgreSQL, Tomas' PlanetPostgreSQL
I've published multiple benchmarks comparing different PostgreSQL versions, as for example the performance archaeology talk (evaluating PostgreSQL 7.4 up to 9.4), and all those benchmark assumed fixed environment (hardware, kernel, ...). Which is fine in many cases (e.g. when evaluating performance impact of a patch), but on production those things do change over time - you get hardware upgrades and from time to time you get an update with a new kernel version. For hardware upgrades (better storage, more RAM, faster CPUs, ...), the impact is usually fairly easy to predict, and moreover people generally realize they need to assess the impact by analyzing the bottlenecks on production and perhaps even testing the new hardware first. But for what about kernel updates? Sadly we usually don't do much benchmarking in this area. The assumption is mostly that new kernels are better than older ones (faster, more efficient, scale to more CPU cores). But is it really true? And how big is the difference? For example what if you upgrade a kernel from 3.0 to 4.7 - will that affect the performance, and if yes, will the performance improve or not? (more…)
Evolution of Fault Tolerance in PostgreSQL: Time Travel

Evolution of Fault Tolerance in PostgreSQL: Time Travel

2ndQuadrant, Featured, Gulcin's PlanetPostgreSQL, PostgreSQL
PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the third post of the series and we’ll talk about timeline issues and their effects on fault tolerance and dependability of PostgreSQL. If you would like to witness the evolution progress from the beginning, please check the first two blog posts of the series: Evolution of Fault Tolerance in PostgreSQL  Evolution of Fault Tolerance in PostgreSQL: Replication Phase  Timelines The ability to restore the database to a previous point in time creates some complexities which we’ll cover some of the cases by explaining failover (Fig. 1), switchover (Fig. 2) and pg_rewind (Fig (more…)
On the benefits of sorted paths

On the benefits of sorted paths

2ndQuadrant, Featured, PostgreSQL, Tomas' PlanetPostgreSQL
I had the pleasure to attend PGDay UK last week - a very nice event, hopefully I'll have the chance to come back next year. There was plenty of interesting talks, but the one that caught my attention in particular was Performace for queries with grouping by Alexey Bashtanov. I have given a fair number of similar performance-oriented talks in the past, so I know how difficult it is to present benchmark results in a comprehensible and interesting way, and Alexey did a pretty good job, I think. So if you deal with data aggregation (i.e. BI, analytics, or similar workloads) I recommend going through the slides and if you get a chance to attend the talk on some other conference, I highly recommend doing so. (more…)