Building an Immortal ClusterBy now we've learned about basic Postgres Docker usage and rudimentary clustering. For the uninitiated, constructing a Postgres cluster can be a daunting task, and we've greatly simplified the process. So why don't we take the next logical step and use Docker to deploy a cluster that is effectively immortal as well? How is that possible? Why, with repmgr of course! 2ndQuadrant has a tool specifically designed to set up and maintain Postgres clusters. One of the components of repmgr is a daemon that can automatically promote replicas whenever the current primary goes down. Let's leverage that to make something that's always online until every node is stopped.My power is in my own handAs before, we're going to need a few scripts to manage the finer points. We can
In our last article, we explored how to run Postgres in some very basic Docker scenarios. Based on our experiments back then, we can obtain images, create containers, and mount to various types of storage.Boring!It's not just boring, it's mundane. It doesn't do anything. Sure we can run Postgres in a container, but that's true about a lot of things. You know what's exciting? Setting up Postgres streaming replication between two docker containers.Let's get started.He say "I know you, you know me"The first thing we need to do is create a container that will be our primary host. Postgres requires a user with REPLICATION permission, as well as a special entry in the pg_hba.conf file for the "replication" pseudo-database. We could start a regular Postgres container, connect to it, and set all
Fans of Rapid Application Development (RAD!) are probably already familiar with Docker, but what does that have to do with Postgres? Database-driven applications are a dime a dozen these days, and a good RAD environment is something of a Holy Grail to coders and QA departments alike. Docker lets us spin up a Postgres instance in seconds, and discard it with a clean conscience.
There have even been some noises within certain circles about using it in a production context. Can we do something like that responsibly? Docker containers are practically guaranteed to be ephemeral, while production data most decidedly isn't. The answer to this is ultimately complex, and something we'll be exploring over the next several weeks.
Let's get started.
Let There Be Light
Since Docker itself is a
During the Postgres Open 2017 conference in San Francisco, someone came to the 2ndQuadrant booth and struck up a conversation with me. During our shameless geeking out over database mechanics, he asked me if pglogical supported the new Postgres 10 partitions. Given my noted expertise in all things Postgres, I answered in the appropriate manner:
"I have no idea. I'll have to look into that."
Well, after a bit of experimentation, I have a more concrete answer, and it's reassuringly positive.
Given a table on a provider node, is it possible to capture only INSERT traffic such that it accumulates on a subscribed system for archival purposes? It's a fairly common tactic, and allows an active OLTP system to regularly purge old data, while a reporting OLAP system keeps it
One of the handy things Oracle does with dates is allow manipulation with standard arithmetic. Want tomorrow's date? Add one. Want a week ago? Subtract seven. Postgres does something close with its INTERVAL syntax, under the explanation that we don't necessarily want to make assumptions about what is being added to a date or timestamp. But Postgres has a big secret in the fact we can arbitrarily create operators to get the behavior we desire.
No Place for Beginners or Sensitive Hearts
So what happens if we happen to know that the developers and applications targeting the database prefer the Oracle approach, and that integers should always increment or decrement a date value? Let's start with something basic like adding a integers to a date. What happens if we try this with a
For a long time, the Postgres query planner has sported a huge blinking neon blind-spot that frustrated and enraged DBAs throughout the universe to a level just shy of murderous frenzy. How is this even possible? What terrible lurking horror could elicit such a visceral response from probably the most boring and straight-laced people ever to draw breath? What else? Correlated statistics.
The Adventure Begins!
The Postgres query planner is a cost-estimation engine. Postgres gathers statistics on table contents such as most and least frequent values, value cardinality, rudimentary histograms, and so on. Along with multiple metrics related to hardware responsiveness, it generates multiple viable execution strategies, and chooses the one that probably costs the least resources. So far, this
One of the coolest things about Postgres functions is that they can return rows as if they were a table. Not only is it possible, but creating a set-returning function in Postgres is almost frighteningly trivial when compared to other database engines. In fact, some Oracle users are probably rolling their eyes and muttering "Just use a cursor!" already. Just hear us out!
Postgres functions can return cursors too, but using them afterwards isn't exactly a friendly experience. Cursors are passed by reference, so the function must either accept a parameter to name the cursor, or it generates something wacky like "<unnamed portal 1>". After that, cursors can only be used with FETCH instead of SELECT, greatly limiting their utility.
What about views, then? Obviously they can return
Testing within the context of a database can sometimes seem a bit overwhelming. Not only is it necessary to create a reasonable set of test data to bootstrap multiple tables, but we need some reliable invocation system to compare query, view, and function output to expected results. And that is no small feat. For a very long time, the "easiest" way to test end-user database objects involved building a Makefile that included PGXS. Then we had to create the tests, run the tests, and copy over the output into static expected files. Modifying any test always meant updating the static output file as well.
It's great that the tools are there, yet why do we need to be halfway to a Postgres developer in order follow good QA practices with Postgres? Well, maybe we don't. It turns out,
With the christening of PG Phriday at its new location, it's only fitting that we revisit a frequent boon and bane of organized storage: partitioning. When volume and scale reach a certain point that the very laws of physics become a barrier to efficiency, partitions are a crucial tool to escape the otherwise inevitable pile of unmaintainable bits. It's not feasible to reindex a multi-terabyte table with tens or hundreds of billions of rows. It's excruciating to excise portions for long-term archival, and all but impossible to perform anything but the most cursory maintenance in general. Huge tables are a partition advocate's dream.
But until fairly recently, the implementation in Postgres has been... less than savory. While nothing is perfect, Postgres has never actually offered