What is the role of PgBouncer in a Postgres High Availability stack? What even is PgBouncer at the end of the day? Is it a glorified traffic cop, or an integral component critical to the long-term survival of a Postgres deployment?
When we talk about Postgres High Availability, a lot of terms might spring to mind. Replicas, streaming, disaster recovery, fail-over, automation; it's a ceaseless litany of architectural concepts and methodologies. The real question is: how do we get from Here to There?
The Importance of Proxies
It's no secret that the application stack must communicate with the database. Regardless of how many layers of decoupling, queues, and atomicity of our implementation, data must eventually be stored for reference. But where is that endpoint? Presuming that write
We all love PgBouncer. It's a great way to multiplex tens, hundreds, or even thousands of client connections to a small handful of Postgres sessions. What isn't necessarily so endearing, is that it can't pass authentication from itself to Postgres, as each Postgres session may exist before the connection to PgBouncer is established. Or can it? Let's explore how things have evolved, but the news never really got out.
Tell you what I got in mind
As a proxy, PgBouncer authenticates on a per user/database basis. Since Postgres authenticates sessions before they connect, PgBouncer used to have no way to re-auth its own connections. The old, and probably most prevalent way to circumvent this, was to build a userlist.txt file that contained every user and password hash that should be allowed to
PgBouncer is a popular proxy and pooling layer for Postgres. It's extremely common to reconfigure PgBouncer with repmgr so it always directs connections to the current primary node. It just so happens our emerging Docker stack could use such a component.In our last article, we combined Postgres with repmgr to build a Docker container that could initialize and maintain a Postgres cluster with automated failover capabilities. Yet there was the lingering issue of connecting to the cluster. It's great that Postgres is always online, but how do we connect to whichever node is the primary?While we could write a layer into our application stack to call repmgr cluster show to find the primary before connecting, that's extremely cumbersome. Besides that, there's a better way. Let's alter our stack