Wednesday, December 12

Author: Tomas Vondra

Databases vs. encryption

2ndQuadrant, Tomas' PlanetPostgreSQL
Let's assume you have some sensitive data, that you need to protect by encryption. It might be credit card numbers (the usual example), social security numbers, or pretty much anything you consider sensitive. It does not matter if the encryption is mandated by a standard like PCI DSS or if you just decided to encrypt the sensitive stuff. You need to do the encryption right and actually protecting the information in both cases. Unfortunately, full-disk-encrytion and pgcrypto are not a good fit for multiple reasons, and application-level encryption reduces the database to "dumb" storage. Let's look at an alternative approach - offloading the encryption to a separate trusted component, implemented as a custom data type. (more…)

Sequential UUID Generators

2ndQuadrant, Tomas' PlanetPostgreSQL
UUIDs are a popular identifier data type - they are unpredictable, and/or globally unique (or at least very unlikely to collide) and quite easy to generate. Traditional primary keys based on sequences won't give you any of that, which makes them unsuitable for public identifiers, and UUIDs solve that pretty naturally. But there are disadvantages too - they may make the access patterns much more random compared to traditional sequential identifiers, cause WAL write amplification etc. So let's look at an extension generating "sequential" UUIDs, and how it can reduce the negative consequences of using UUIDs. (more…)
PostgreSQL Meltdown Benchmarks

PostgreSQL Meltdown Benchmarks

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
Two serious security vulnerabilities (code named Meltdown and Spectre) were revealed a couple of weeks ago. Initial tests suggested the performance impact of mitigations (added in the kernel) might be up to ~30% for some workloads, depending on the syscall rate. Those early estimates had to be done quickly, and so were based on limited amounts of testing. Furthermore, the in-kernel fixes evolved and improved over time, and we now also got retpoline which should address Spectre v2. This post presents data from more thorough tests, hopefully providing more reliable estimates for typical PostgreSQL workloads. (more…)

Future of Postgres-XL

PostgreSQL, Tomas' PlanetPostgreSQL
You probably know that Postgres-XL is a distributed database based on PostgreSQL. A few days ago we pushed the XL 9.6 code into the public git repository. Additional details about the new stuff available in Postgres-XL 9.6 are available here. The topic of this blog post is quite different, though. I'd like to discuss some changes to the project management and development practices, and why (and how) we plan to tweak it. (more…)

What’s new in Postgres-XL 9.6

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
For the last few months, we at 2ndQuadrant have been working on merging PostgreSQL 9.6 into Postgres-XL, which turned out to be quite challenging for various reasons, and took more time than initially planned due to several invasive upstream changes. If you’re interested, look at the official repository here (look at the “master” branch for now). There’s still quite a bit of work to be done - merging a few remaining bits from upstream, fixing known bugs and regression failures, testing, etc. If you’re considering contributing to Postgres-XL, this is an ideal opportunity (send me an e-mail and I’ll help you with the first steps). But overall, Postgres-XL 9.6 is clearly a major step forward in a number of important areas. (more…)

In the defense of sar (and how to configure it)

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
Let me discuss a topic that is not inherently PostgreSQL specific, but that I regularly run into while investigating issues on customer systems, evaluating "supportability" of those systems, etc. It's the importance of having a monitoring solution for system metrics, configuring it reasonably, and why sar is still by far my favorite tool (at least on Linux). (more…)

When autovacuum does not vacuum

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
A few weeks ago I explained basics of autovacuum tuning. At the end of that post I promised to look into problems with vacuuming soon. Well, it took a bit longer than I planned, but here we go. To quickly recap, autovacuum is a background process cleaning up dead rows, e.g. old deleted row versions. You can also perform the cleanup manually by running VACUUM, but autovacuum does that automatically depending on the amount of dead rows in the table, at the right moment - not too often but frequently enough to keep the amount of "garbage" under control. (more…)

Autovacuum Tuning Basics

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
A few weeks ago I covered the basics of tuning checkpoints, and in that post I also mentioned that the second common source of performance issues is autovacuum (based on what we see on the mailing list and at our customers under support). So let me follow-up on that with this post about the basics of autovacuum tuning. I'll very briefly explain the necessary theory (dead tuples, bloat and how autovacuum deals with it), but the main focus of this blog post is tuning - what configuration options are there, rules of thumb, etc. (more…)

On the impact of full-page writes

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
While tweaking postgresql.conf, you might have noticed there's an option called full_page_writes. The comment next to it says something about partial page writes, and people generally leave it set to on - which is a good thing, as I'll explain later in this post. It's however useful to understand what full page writes do, because the impact on performance may be quite significant. Unlike my previous post on checkpoint tuning, this is not a guide how to tune the server. There's not much you can tweak, really, but I'll show you how some application-level decisions (e.g. choice of data types) may interact with full page writes. (more…)

Basics of Tuning Checkpoints

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
On systems doing non-trivial number of writes, tuning checkpoints is crucial for getting good performance. Yet checkpoints are one of the areas where we often identify confusion and configuration issues, both on the community mailing lists and during performance tuning reviews for our customers. (The other one being autovacuum, discussed a few days ago by Joe Nelson from Citus.) So let me walk you through the checkpoints - what they do and how to tune them in PostgreSQL. (more…)