Postgres has a lot of built-in information functions that most people don't know about. Some of these are critical components to identifying replication lag, but what if we could use them for something else, like throughput?
This man's one simple trick can track actual database throughput; DBAs hate him!
Let's take a look at a common query we might use to track replication lag between a Postgres 11 Primary and one or more Replica nodes.
) AS sent_lag,
) AS write_lag,
) AS flush_lag
What is the role of PgBouncer in a Postgres High Availability stack? What even is PgBouncer at the end of the day? Is it a glorified traffic cop, or an integral component critical to the long-term survival of a Postgres deployment?
When we talk about Postgres High Availability, a lot of terms might spring to mind. Replicas, streaming, disaster recovery, fail-over, automation; it's a ceaseless litany of architectural concepts and methodologies. The real question is: how do we get from Here to There?
The Importance of Proxies
It's no secret that the application stack must communicate with the database. Regardless of how many layers of decoupling, queues, and atomicity of our implementation, data must eventually be stored for reference. But where is that endpoint? Presuming that write
Studying Stored Procs in Postgres 11
With Postgres 11 looming on the near horizon, it's only appropriate to check out a recent beta and kick the tires a few times. Whether it's improvements in parallelism, partitions, stored procedures, JIT functionality, or any number of elements in the release page, there's a lot to investigate.
It just so happens that I ran across a fortuitous event on Twitter when deciding on an appropriate topic. Behold!
Wait! No! That's not what stored procedures are for!
I felt so good like anything was possible
When confronted with such a blatant corruption of such a cool new feature, it's only natural to question the wisdom of doing so. It is, after all, not a great idea to programatically consume transaction IDs. I said as much and moved on with life
When working with database capacity planning, there are a lot of variables to consider, and Postgres is no different in this regard. One of the elements which requires management is storage. However, there's an aspect of storage that escapes inspection almost without exception, and it's hiding in the shadows between the columns themselves.
In most low-level computer languages like C, in which the venerable Postgres is written, data types are addressed by their maximum size, regardless of how large they actually are. Thus a standard 32-bit integer which can store a value of just over 2-billion, must be read as a whole unit. This means even the value of 0 requires 4 bytes of storage.
Further, Postgres is designed such that its own internal natural alignment is 8 bytes,
Just in case you missed the live broadcast, the video of my presentation below covers various topics around integration of PostgreSQL with other data sources and database technologies.
This presentation covers the following topics:
What is a Foreign Data Wrapper?
How to query MySQL, a Flat file, a Python script, a REST interface and a different Postgres Node
Perform all of the above simultaneously
Take snapshots of this data for fast access
Tweak remote systems for better performance
Package as an API for distribution
Stream to Kafka
Write data to... MongoDB!?
What does all of this have in common?
It's an exciting topic, and I hope more developers and admins begin to see Postgres as the global integration system it really is
With the addition of logical replication in Postgres 10, it's natural to ask "what's next"? Though not directly supported yet, would it be possible to subscribe two Postgres 10 nodes to each other? What kind of future would that be, and what kind of scenarios would be ideal for such an arrangement?
As it turns out, we already have a kind of answer thanks to the latency inherent to the speed of light: locality. If we can provide a local database node for every physical app location, we also reduce latency by multiple orders of magnitude.
Let's explore the niche BDR was designed to fill.
What is Postgres-BDR?
Postgres-BDR is simply short for Postgres Bi-Directional Replication. Believe it or not, that's all it needs to be. The implications of the name itself are numerous once fully
We all love PgBouncer. It's a great way to multiplex tens, hundreds, or even thousands of client connections to a small handful of Postgres sessions. What isn't necessarily so endearing, is that it can't pass authentication from itself to Postgres, as each Postgres session may exist before the connection to PgBouncer is established. Or can it? Let's explore how things have evolved, but the news never really got out.
Tell you what I got in mind
As a proxy, PgBouncer authenticates on a per user/database basis. Since Postgres authenticates sessions before they connect, PgBouncer used to have no way to re-auth its own connections. The old, and probably most prevalent way to circumvent this, was to build a userlist.txt file that contained every user and password hash that should be allowed to
Backups are a critical component to a fully covered Postgres database infrastructure. In some ways, it's fair to say a database without a backup is no database at all---sometimes literally. 2ndQuadrant's Barman tool is aptly named as a Backup And Recovery Manager for Postgres, and it exists primarily for encouraging a stable and robust backup process.
Backing up a database and then restoring from that backup often has an army of associated scripts and utilities. Some of these components are probably the Postgres default tools of pg_dump, pg_restore, and pg_basebackup. The rest are often shell scripts, homegrown or otherwise copied from blogs or git repositories. Let's just assume the best case scenario and assume all of these work just fine; I used to think the same thing myself.
ZFS is a filesystem originally created by Sun Microsystems, and has been available for BSD over a decade. While Postgres will run just fine on BSD, most Postgres installations are historically Linux-based systems. ZFS on Linux has had much more of a rocky road to integration due to perceived license incompatibilities.
As a consequence, administrators were reluctant or outright refused to run ZFS on their Linux clusters. It wasn't until OpenZFS was introduced in 2013 that this slowly began to change. These days, ZFS and Linux are starting to become more integrated, and Canonical of Ubuntu fame even announced direct support for ZFS in their 16.04 LTS release.
So how can a relatively obscure filesystem designed by a now-defunct hardware and software company help Postgres? Let's find out!
PgBouncer is a popular proxy and pooling layer for Postgres. It's extremely common to reconfigure PgBouncer with repmgr so it always directs connections to the current primary node. It just so happens our emerging Docker stack could use such a component.In our last article, we combined Postgres with repmgr to build a Docker container that could initialize and maintain a Postgres cluster with automated failover capabilities. Yet there was the lingering issue of connecting to the cluster. It's great that Postgres is always online, but how do we connect to whichever node is the primary?While we could write a layer into our application stack to call repmgr cluster show to find the primary before connecting, that's extremely cumbersome. Besides that, there's a better way. Let's alter our stack