What is the role of PgBouncer in a Postgres High Availability stack? What even is PgBouncer at the end of the day? Is it a glorified traffic cop, or an integral component critical to the long-term survival of a Postgres deployment?
When we talk about Postgres High Availability, a lot of terms might spring to mind. Replicas, streaming, disaster recovery, fail-over, automation; it's a ceaseless litany of architectural concepts and methodologies. The real question is: how do we get from Here to There?
The Importance of Proxies
It's no secret that the application stack must communicate with the database. Regardless of how many layers of decoupling, queues, and atomicity of our implementation, data must eventually be stored for reference. But where is that endpoint? Presuming that write
Studying Stored Procs in Postgres 11
With Postgres 11 looming on the near horizon, it's only appropriate to check out a recent beta and kick the tires a few times. Whether it's improvements in parallelism, partitions, stored procedures, JIT functionality, or any number of elements in the release page, there's a lot to investigate.
It just so happens that I ran across a fortuitous event on Twitter when deciding on an appropriate topic. Behold!
Wait! No! That's not what stored procedures are for!
I felt so good like anything was possible
When confronted with such a blatant corruption of such a cool new feature, it's only natural to question the wisdom of doing so. It is, after all, not a great idea to programatically consume transaction IDs. I said as much and moved on with life
When working with database capacity planning, there are a lot of variables to consider, and Postgres is no different in this regard. One of the elements which requires management is storage. However, there's an aspect of storage that escapes inspection almost without exception, and it's hiding in the shadows between the columns themselves.
In most low-level computer languages like C, in which the venerable Postgres is written, data types are addressed by their maximum size, regardless of how large they actually are. Thus a standard 32-bit integer which can store a value of just over 2-billion, must be read as a whole unit. This means even the value of 0 requires 4 bytes of storage.
Further, Postgres is designed such that its own internal natural alignment is 8 bytes,
With the addition of logical replication in Postgres 10, it's natural to ask "what's next"? Though not directly supported yet, would it be possible to subscribe two Postgres 10 nodes to each other? What kind of future would that be, and what kind of scenarios would be ideal for such an arrangement?
As it turns out, we already have a kind of answer thanks to the latency inherent to the speed of light: locality. If we can provide a local database node for every physical app location, we also reduce latency by multiple orders of magnitude.
Let's explore the niche BDR was designed to fill.
What is Postgres-BDR?
Postgres-BDR is simply short for Postgres Bi-Directional Replication. Believe it or not, that's all it needs to be. The implications of the name itself are numerous once fully
We all love PgBouncer. It's a great way to multiplex tens, hundreds, or even thousands of client connections to a small handful of Postgres sessions. What isn't necessarily so endearing, is that it can't pass authentication from itself to Postgres, as each Postgres session may exist before the connection to PgBouncer is established. Or can it? Let's explore how things have evolved, but the news never really got out.
Tell you what I got in mind
As a proxy, PgBouncer authenticates on a per user/database basis. Since Postgres authenticates sessions before they connect, PgBouncer used to have no way to re-auth its own connections. The old, and probably most prevalent way to circumvent this, was to build a userlist.txt file that contained every user and password hash that should be allowed to
Backups are a critical component to a fully covered Postgres database infrastructure. In some ways, it's fair to say a database without a backup is no database at all---sometimes literally. 2ndQuadrant's Barman tool is aptly named as a Backup And Recovery Manager for Postgres, and it exists primarily for encouraging a stable and robust backup process.
Backing up a database and then restoring from that backup often has an army of associated scripts and utilities. Some of these components are probably the Postgres default tools of pg_dump, pg_restore, and pg_basebackup. The rest are often shell scripts, homegrown or otherwise copied from blogs or git repositories. Let's just assume the best case scenario and assume all of these work just fine; I used to think the same thing myself.
ZFS is a filesystem originally created by Sun Microsystems, and has been available for BSD over a decade. While Postgres will run just fine on BSD, most Postgres installations are historically Linux-based systems. ZFS on Linux has had much more of a rocky road to integration due to perceived license incompatibilities.
As a consequence, administrators were reluctant or outright refused to run ZFS on their Linux clusters. It wasn't until OpenZFS was introduced in 2013 that this slowly began to change. These days, ZFS and Linux are starting to become more integrated, and Canonical of Ubuntu fame even announced direct support for ZFS in their 16.04 LTS release.
So how can a relatively obscure filesystem designed by a now-defunct hardware and software company help Postgres? Let's find out!
PgBouncer is a popular proxy and pooling layer for Postgres. It's extremely common to reconfigure PgBouncer with repmgr so it always directs connections to the current primary node. It just so happens our emerging Docker stack could use such a component.In our last article, we combined Postgres with repmgr to build a Docker container that could initialize and maintain a Postgres cluster with automated failover capabilities. Yet there was the lingering issue of connecting to the cluster. It's great that Postgres is always online, but how do we connect to whichever node is the primary?While we could write a layer into our application stack to call repmgr cluster show to find the primary before connecting, that's extremely cumbersome. Besides that, there's a better way. Let's alter our stack
Building an Immortal ClusterBy now we've learned about basic Postgres Docker usage and rudimentary clustering. For the uninitiated, constructing a Postgres cluster can be a daunting task, and we've greatly simplified the process. So why don't we take the next logical step and use Docker to deploy a cluster that is effectively immortal as well? How is that possible? Why, with repmgr of course! 2ndQuadrant has a tool specifically designed to set up and maintain Postgres clusters. One of the components of repmgr is a daemon that can automatically promote replicas whenever the current primary goes down. Let's leverage that to make something that's always online until every node is stopped.My power is in my own handAs before, we're going to need a few scripts to manage the finer points. We can
In our last article, we explored how to run Postgres in some very basic Docker scenarios. Based on our experiments back then, we can obtain images, create containers, and mount to various types of storage.Boring!It's not just boring, it's mundane. It doesn't do anything. Sure we can run Postgres in a container, but that's true about a lot of things. You know what's exciting? Setting up Postgres streaming replication between two docker containers.Let's get started.He say "I know you, you know me"The first thing we need to do is create a container that will be our primary host. Postgres requires a user with REPLICATION permission, as well as a special entry in the pg_hba.conf file for the "replication" pseudo-database. We could start a regular Postgres container, connect to it, and set all