Friday, December 14

Author: Tomas Vondra

PostgreSQL vs. Linux kernel versions

PostgreSQL vs. Linux kernel versions

2ndQuadrant, Featured, PostgreSQL, Tomas' PlanetPostgreSQL
I've published multiple benchmarks comparing different PostgreSQL versions, as for example the performance archaeology talk (evaluating PostgreSQL 7.4 up to 9.4), and all those benchmark assumed fixed environment (hardware, kernel, ...). Which is fine in many cases (e.g. when evaluating performance impact of a patch), but on production those things do change over time - you get hardware upgrades and from time to time you get an update with a new kernel version. For hardware upgrades (better storage, more RAM, faster CPUs, ...), the impact is usually fairly easy to predict, and moreover people generally realize they need to assess the impact by analyzing the bottlenecks on production and perhaps even testing the new hardware first. But for what about kernel updates? Sadly we usually don't do much benchmarking in this area. The assumption is mostly that new kernels are better than older ones (faster, more efficient, scale to more CPU cores). But is it really true? And how big is the difference? For example what if you upgrade a kernel from 3.0 to 4.7 - will that affect the performance, and if yes, will the performance improve or not? (more…)
On the benefits of sorted paths

On the benefits of sorted paths

2ndQuadrant, Featured, PostgreSQL, Tomas' PlanetPostgreSQL
I had the pleasure to attend PGDay UK last week - a very nice event, hopefully I'll have the chance to come back next year. There was plenty of interesting talks, but the one that caught my attention in particular was Performace for queries with grouping by Alexey Bashtanov. I have given a fair number of similar performance-oriented talks in the past, so I know how difficult it is to present benchmark results in a comprehensible and interesting way, and Alexey did a pretty good job, I think. So if you deal with data aggregation (i.e. BI, analytics, or similar workloads) I recommend going through the slides and if you get a chance to attend the talk on some other conference, I highly recommend doing so. (more…)

Application users vs. Row Level Security

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
A few days ago I've blogged about the common issues with roles and privileges we discover during security reviews. Of course, PostgreSQL offers many advanced security-related features, one of them being Row Level Security (RLS), available since PostgreSQL 9.5. As 9.5 was released in January 2016 (so just a few months ago), RLS is fairly new feature and we're not really dealing with many production deployments yet. Instead RLS is a common subject of "how to implement" discussions, and one of the most common questions is how to make it work with application-level users. So let's see what possible solutions there are. (more…)

Auditing Users and Roles in PostgreSQL

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
One of the services we offer are security reviews (or audits, if you want), covering a range of areas related to security. It may be a bit surprising, but a topic that often yields the most serious issues is roles and privileges. Perhaps the reason why roles and privileges are a frequent source of issues is that it seems to be quite simple and similar to things the engineers are familiar with (e.g. Unix system of users and groups), but it turns out there are a few key differences with major consequences. The other parts are either very straightforward and understandable even for sysadmins without much PostgreSQL experience (e.g. authentication config in pg_hba.conf), or the engineers recognize the complexity and take their time to familiarize with the details (a good example of this is (more…)

On the usefulness of expression indexes

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
When teaching PostgreSQL trainings, both on basics and advanced topics, I often find out the attendees have very little idea how powerful the expression indexes may be (if they are aware of them at all). So let me give you a brief overview. So, let's say we have a table, with a range of timestamps (yes, we have generate_series function that can generate dates): CREATE TABLE t AS SELECT d, repeat(md5(d::text), 10) AS padding FROM generate_series(timestamp '1900-01-01', timestamp '2100-01-01', interval '1 day') s(d); VACUUM ANALYZE t; The table also includes a padding column, to make it a bit larger. Now, let's do a simple range query, selecting just one month from the ~200 years included in the table. If you do explain on the (more…)

Tables and indexes vs. HDD and SSD

2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL
Although in the future most database servers (particularly those handling OLTP-like workloads) will use a flash-based storage, we're not there yet - flash storage is still considerably more expensive than traditional hard drives, and so many systems use a mix of SSD and HDD drives. That however means we need to decide how to split the database - what should go to the spinning rust (HDD) and what is a good candidate for the flash storage that is more expensive but much better at handling random I/O. There are solutions that try to handle this automatically at the storage level by automatically using SSDs as a cache, automatically keeping the active part of the data on SSD. Storage appliances / SANs often do this internally, there are hybrid SATA/SAS drives with large HDD and small SSD in (more…)

Performance limits of logical replication solutions

2ndQuadrant, pglogical, PostgreSQL, Tomas' PlanetPostgreSQL
In the previous blog post, I briefly explained how we got the performance numbers published in the pglogical announcement. In this blog post I’d like to discuss the performance limits of logical replication solutions in general, and also how they apply to pglogical. physical replication Firstly, let’s see how physical replication (built into PostgreSQL since version 9.0) works. A somewhat simplified figure of the with two just two nodes looks like this: Clients execute queries on the master node, the changes are written to a transaction log (WAL) and copied over network to WAL on the standby node. The recovery on the standby process on the standby then reads the changes from WAL and applies them to the data files just like during recovery. If the standby is in “hot_standby” (more…)

On pglogical performance

pglogical, PostgreSQL, Tomas' PlanetPostgreSQL
A few days ago we released pglogical, a fully open-source logical replication solution for PostgreSQL, that’ll hopefully get included into the PostgreSQL tree in a not-too-distant future. I’m not going to discuss about all the things enabled by logical replication - the pglogical release announcement presents a quite good overview, and Simon also briefly explained the advantages of logical replication in another post a few days ago. Instead I’d like to talk about one particular aspect mentioned in the announcement - performance comparison with existing solutions. The pglogical page mentions ... preliminary internal testing demonstrating a 5x increase in transaction throughput (OLTP workloads using pgBench ) over other replication methods like slony and londiste3. So let's see where the statement comes from. (more…)