ZFS is a filesystem originally created by Sun Microsystems, and has been available for BSD over a decade. While Postgres will run just fine on BSD, most Postgres installations are historically Linux-based systems. ZFS on Linux has had much more of a rocky road to integration due to perceived license incompatibilities.
As a consequence, administrators were reluctant or outright refused to run ZFS on their Linux clusters. It wasn't until OpenZFS was introduced in 2013 that this slowly began to change. These days, ZFS and Linux are starting to become more integrated, and Canonical of Ubuntu fame even announced direct support for ZFS in their 16.04 LTS release.
So how can a relatively obscure filesystem designed by a now-defunct hardware and software company help Postgres? Let's find out!
While tweaking postgresql.conf, you might have noticed there's an option called full_page_writes. The comment next to it says something about partial page writes, and people generally leave it set to on - which is a good thing, as I'll explain later in this post. It's however useful to understand what full page writes do, because the impact on performance may be quite significant.
Unlike my previous post on checkpoint tuning, this is not a guide how to tune the server. There's not much you can tweak, really, but I'll show you how some application-level decisions (e.g. choice of data types) may interact with full page writes.
I've published multiple benchmarks comparing different PostgreSQL versions, as for example the performance archaeology talk (evaluating PostgreSQL 7.4 up to 9.4), and all those benchmark assumed fixed environment (hardware, kernel, ...). Which is fine in many cases (e.g. when evaluating performance impact of a patch), but on production those things do change over time - you get hardware upgrades and from time to time you get an update with a new kernel version.
For hardware upgrades (better storage, more RAM, faster CPUs, ...), the impact is usually fairly easy to predict, and moreover people generally realize they need to assess the impact by analyzing the bottlenecks on production and perhaps even testing the new hardware first.
But for what about kernel updates? Sadly we usually don't do much benchmarking in this area. The assumption is mostly that new kernels are better than older ones (faster, more efficient, scale to more CPU cores). But is it really true? And how big is the difference? For example what if you upgrade a kernel from 3.0 to 4.7 - will that affect the performance, and if yes, will the performance improve or not?
Although in the future most database servers (particularly those handling OLTP-like workloads) will use a flash-based storage, we're not there yet - flash storage is still considerably more expensive than traditional hard drives, and so many systems use a mix of SSD and HDD drives. That however means we need to decide how to split the database - what should go to the spinning rust (HDD) and what is a good candidate for the flash storage that is more expensive but much better at handling random I/O.
There are solutions that try to handle this automatically at the storage level by automatically using SSDs as a cache, automatically keeping the active part of the data on SSD. Storage appliances / SANs often do this internally, there are hybrid SATA/SAS drives with large HDD and small SSD in (more…)
A few days ago we released pglogical, a fully open-source logical replication solution for PostgreSQL, that’ll hopefully get included into the PostgreSQL tree in a not-too-distant future. I’m not going to discuss about all the things enabled by logical replication - the pglogical release announcement presents a quite good overview, and Simon also briefly explained the advantages of logical replication in another post a few days ago.
Instead I’d like to talk about one particular aspect mentioned in the announcement - performance comparison with existing solutions. The pglogical page mentions
... preliminary internal testing demonstrating a 5x increase in transaction throughput (OLTP workloads using pgBench ) over other replication methods like slony and londiste3. So let's see where the statement comes from.