Backups are a critical component to a fully covered Postgres database infrastructure. In some ways, it's fair to say a database without a backup is no database at all---sometimes literally. 2ndQuadrant's Barman tool is aptly named as a Backup And Recovery Manager for Postgres, and it exists primarily for encouraging a stable and robust backup process.
Backing up a database and then restoring from that backup often has an army of associated scripts and utilities. Some of these components are probably the Postgres default tools of pg_dump, pg_restore, and pg_basebackup. The rest are often shell scripts, homegrown or otherwise copied from blogs or git repositories. Let's just assume the best case scenario and assume all of these work just fine; I used to think the same thing myself.
ZFS is a filesystem originally created by Sun Microsystems, and has been available for BSD over a decade. While Postgres will run just fine on BSD, most Postgres installations are historically Linux-based systems. ZFS on Linux has had much more of a rocky road to integration due to perceived license incompatibilities.
As a consequence, administrators were reluctant or outright refused to run ZFS on their Linux clusters. It wasn't until OpenZFS was introduced in 2013 that this slowly began to change. These days, ZFS and Linux are starting to become more integrated, and Canonical of Ubuntu fame even announced direct support for ZFS in their 16.04 LTS release.
So how can a relatively obscure filesystem designed by a now-defunct hardware and software company help Postgres? Let's find out!
PgBouncer is a popular proxy and pooling layer for Postgres. It's extremely common to reconfigure PgBouncer with repmgr so it always directs connections to the current primary node. It just so happens our emerging Docker stack could use such a component.In our last article, we combined Postgres with repmgr to build a Docker container that could initialize and maintain a Postgres cluster with automated failover capabilities. Yet there was the lingering issue of connecting to the cluster. It's great that Postgres is always online, but how do we connect to whichever node is the primary?While we could write a layer into our application stack to call repmgr cluster show to find the primary before connecting, that's extremely cumbersome. Besides that, there's a better way. Let's alter our stack
Building an Immortal ClusterBy now we've learned about basic Postgres Docker usage and rudimentary clustering. For the uninitiated, constructing a Postgres cluster can be a daunting task, and we've greatly simplified the process. So why don't we take the next logical step and use Docker to deploy a cluster that is effectively immortal as well? How is that possible? Why, with repmgr of course! 2ndQuadrant has a tool specifically designed to set up and maintain Postgres clusters. One of the components of repmgr is a daemon that can automatically promote replicas whenever the current primary goes down. Let's leverage that to make something that's always online until every node is stopped.My power is in my own handAs before, we're going to need a few scripts to manage the finer points. We can
In our last article, we explored how to run Postgres in some very basic Docker scenarios. Based on our experiments back then, we can obtain images, create containers, and mount to various types of storage.Boring!It's not just boring, it's mundane. It doesn't do anything. Sure we can run Postgres in a container, but that's true about a lot of things. You know what's exciting? Setting up Postgres streaming replication between two docker containers.Let's get started.He say "I know you, you know me"The first thing we need to do is create a container that will be our primary host. Postgres requires a user with REPLICATION permission, as well as a special entry in the pg_hba.conf file for the "replication" pseudo-database. We could start a regular Postgres container, connect to it, and set all
Fans of Rapid Application Development (RAD!) are probably already familiar with Docker, but what does that have to do with Postgres? Database-driven applications are a dime a dozen these days, and a good RAD environment is something of a Holy Grail to coders and QA departments alike. Docker lets us spin up a Postgres instance in seconds, and discard it with a clean conscience.
There have even been some noises within certain circles about using it in a production context. Can we do something like that responsibly? Docker containers are practically guaranteed to be ephemeral, while production data most decidedly isn't. The answer to this is ultimately complex, and something we'll be exploring over the next several weeks.
Let's get started.
Let There Be Light
Since Docker itself is a
During the Postgres Open 2017 conference in San Francisco, someone came to the 2ndQuadrant booth and struck up a conversation with me. During our shameless geeking out over database mechanics, he asked me if pglogical supported the new Postgres 10 partitions. Given my noted expertise in all things Postgres, I answered in the appropriate manner:
"I have no idea. I'll have to look into that."
Well, after a bit of experimentation, I have a more concrete answer, and it's reassuringly positive.
Given a table on a provider node, is it possible to capture only INSERT traffic such that it accumulates on a subscribed system for archival purposes? It's a fairly common tactic, and allows an active OLTP system to regularly purge old data, while a reporting OLAP system keeps it
One of the handy things Oracle does with dates is allow manipulation with standard arithmetic. Want tomorrow's date? Add one. Want a week ago? Subtract seven. Postgres does something close with its INTERVAL syntax, under the explanation that we don't necessarily want to make assumptions about what is being added to a date or timestamp. But Postgres has a big secret in the fact we can arbitrarily create operators to get the behavior we desire.
No Place for Beginners or Sensitive Hearts
So what happens if we happen to know that the developers and applications targeting the database prefer the Oracle approach, and that integers should always increment or decrement a date value? Let's start with something basic like adding a integers to a date. What happens if we try this with a
For a long time, the Postgres query planner has sported a huge blinking neon blind-spot that frustrated and enraged DBAs throughout the universe to a level just shy of murderous frenzy. How is this even possible? What terrible lurking horror could elicit such a visceral response from probably the most boring and straight-laced people ever to draw breath? What else? Correlated statistics.
The Adventure Begins!
The Postgres query planner is a cost-estimation engine. Postgres gathers statistics on table contents such as most and least frequent values, value cardinality, rudimentary histograms, and so on. Along with multiple metrics related to hardware responsiveness, it generates multiple viable execution strategies, and chooses the one that probably costs the least resources. So far, this
One of the coolest things about Postgres functions is that they can return rows as if they were a table. Not only is it possible, but creating a set-returning function in Postgres is almost frighteningly trivial when compared to other database engines. In fact, some Oracle users are probably rolling their eyes and muttering "Just use a cursor!" already. Just hear us out!
Postgres functions can return cursors too, but using them afterwards isn't exactly a friendly experience. Cursors are passed by reference, so the function must either accept a parameter to name the cursor, or it generates something wacky like "<unnamed portal 1>". After that, cursors can only be used with FETCH instead of SELECT, greatly limiting their utility.
What about views, then? Obviously they can return