Logical Replication in PostgreSQL 10

April 28, 2017/37 Comments/in Petr's PlanetPostgreSQL /by Petr Jelinek

PostgreSQL 10 is getting close to its first beta release and it will include the initial support for logical replication, which is was written primarily by me and committed by my colleague Peter Eisentraut, and is internally based on the work 2ndQuadrant did on pglogical (even though the user interface is somewhat different).

I’d like to share some overview of basics in this blog post.

What’s logical replication?

Let me start with briefly mentioning what logical replication is and what’s it good for. I expect that most people know the PostgreSQL streaming master-standby replication that has been part of PostgreSQL for years and is commonly used both for high availability and read scaling.

So why add another replication mechanism and why call it logical? Well, the traditional replication works by shipping the write ahead log (binary transaction log) to the standby server which applies the changes. And the changes described there are very low-level and, in simplified terms, it just says what bytes to add or change in what file. The result of this is that the standby server is bit for bit copy of the master server. That’s fine if you want to just have a copy of all your data (and databases) but not if you need some flexibility of what should be sent and where. It also means that you can’t write anything on the standby since then it would no longer be an exact copy. And as for example using temporary tables requires writes, it’s not possible to use them on a standby.

Logical replication uses the same information in the binary file, but translates it back into logical changes. For example, we know that a row with some data was inserted into a specific table, instead of just knowing we should append bytes to a file. This allows us to do some interesting things. First, since we know what table changed, we can filter based on that, so that we can now replicate just some tables from a database as opposed to data of a whole instance. The other big difference which this implies is that the downstream no longer applies just binary changes to files but actual changes to tables and so the downstream is no longer an exact binary copy of the master. So now we can do writes on the downstream, we can use temporary tables, we can add additional indexes (very handy, for example, when the additional server is used for analytics) and we can even have multiple upstream servers replicate to single downstream and combine the data.

Basic use

Okay, "that’s all cool" I hear you say, "but how do I use it then?".

Logical replication uses a publish/subscribe model and so we create publications on the upstream (or publisher) and subscriptions on downstream (or subscriber). Before we can really do that though we need to change the configuration of PostgreSQL a bit. The PostgreSQL 10 comes with new defaults that allow replication slots, walsenders and replication connections (all prerequisites for replication) out of the box, so the changes aren’t big. For a start it’s enough to change the configuration parameter wal_level to 'logical' so that the write ahead log contains information needed for the translation of binary changes back to logical ones.

Once the basic configuration is done we can start creating publications. The command for that looks something like this:

CREATE PUBLICATION testpub FOR TABLE users, addresses;

The above will create a new publication and adds the tables users and addresses to it. There is also shorthand to publish all user tables in the database:

CREATE PUBLICATION alltables FOR ALL TABLES;

See the CREATE PUBLICATION documentation for more details.

After publications are created, the other servers can subscribe to them. That’s accomplished by creating subscription like this:

CREATE SUBSCRIPTION testsub CONNECTION 'host=upstream-host dbname=users ...'
 PUBLICATION testpub;

This creates a new subscription testsub which will start replicating tables that are part of the testpub publication. And that’s all that’s needed to get basic replication working! By default, the new subscription will also copy any preexisting data in those tables. This can be optionally disabled (using the WITH (NOCOPY DATA) clause). See the CREATE SUBSCRIPTION documentation for more details. Note that the definition of the table is not copied at this time, so we need to create the tables ourselves, as the replication worker will error if it can’t find the table locally.

When new tables are added to the publication, the subscription will not learn about them automatically, and so they will not be replicated. To replicate them we need to run a command which updates the subscription’s idea about what tables are published:

ALTER SUBSCRIPTION testsub REFRESH PUBLICATION;

This again will copy any existing data for new tables and supports the WITH (NOCOPY DATA) clause.

Monitoring

Now that we have logical replication setup, it would also be nice to see what’s happening and if it’s even working. For that there are two monitoring views. One is an already familiar one called pg_stat_replication which shows all the replication connections to the current server. The logical replication subscribers will be shown there along with the standbys, providing same information. The other view is pg_stat_subscription which which shows status information about the subscription on the downstream server. It contains one entry per subscription plus another entry for every table that’s currently being synchronized (the existing data are being copied).

Example

That’s enough for basic overview, now let’s see example captured output from psql.

Let’s setup some table and publication on a publisher:

testdb=# CREATE TABLE customers (
 login text PRIMARY KEY,
 full_name text NOT NULL,
 registration_date timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE

testdb=# INSERT INTO customers(login, full_name) VALUES('john', 'John Doe');
INSERT 0 1

testdb=# CREATE PUBLICATION testpub FOR ALL TABLES;
CREATE PUBLICATION

Now setup the subscription:

testdb=# CREATE TABLE customers (
 login text PRIMARY KEY,
 full_name text NOT NULL,
 registration_date timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE

testdb=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost dbname=testdb' PUBLICATION testpub;
NOTICE:  synchronized table states
NOTICE:  created replication slot "testsub" on publisher
CREATE SUBSCRIPTION

We can see that it also synchronized the table information and creates a replication slot on the publisher.

Let’s check the status of the subscription now.

testdb=# SELECT * FROM pg_stat_subscription;
-[ RECORD 1 ]---------+------------------------------
subid                 | 16403
subname               | testsub
pid                   | 13109
relid                 | [NULL]
received_lsn          | 0/15F1A10
last_msg_send_time    | 2017-04-28 10:38:17.862848+02
last_msg_receipt_time | 2017-04-28 10:38:17.864402+02
latest_end_lsn        | 0/15F1A10
latest_end_time       | 2017-04-28 10:38:17.862848+02

And also verify that the existing row was replicated.

testdb=# SELECT * FROM customers;
-[ RECORD 1 ]-----+------------------------------
login             | john
full_name         | John Doe
registration_date | 2017-04-28 10:36:00.112817+02

Everything works as expected. Back on the provider we can also check the replication status, just like we would for a normal standby.

testdb=# SELECT application_name, backend_start, state, sent_location, write_location, flush_location, sync_state FROM pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
application_name | testsub
backend_start    | 2017-04-28 10:38:16.852043+02
state            | streaming
sent_location    | 0/15F1A10
write_location   | 0/15F1A10
flush_location   | 0/15F1A10
sync_state       | async

Note that the application_name shown here is the same as the name of the subscription we created.

Okay, let’s insert one more row and check if it gets replicated:

testdb=# INSERT INTO customers(login, full_name) VALUES('jane', 'Jane Doe');
INSERT 0 1

And once more on the subscriber we can see that the data is there:

testdb=# SELECT * FROM customers;
 login | full_name |       registration_date
-------+-----------+-------------------------------
 john  | John Doe  | 2017-04-28 10:36:00.112817+02
 jane  | Jane Doe  | 2017-04-28 10:40:53.332686+02
(2 rows)

Summary

Getting logical replication into PostgreSQL was large undertaking and I am happy that we managed to get something into PostgreSQL 10 which is already quite useful. It does not have all features of pglogical, but it’s still an important step to providing the basic version of this powerful functionality out of the box. It also opens other possible uses for the underlying technology for other features inside of PostgreSQL.

We’ll look into some aspects of logical replication more deeply in future blog posts – so be sure to follow our social channels to not miss the update!

37 replies

Mario Guerrero says:
April 29, 2017 at 3:58 pm

Thanks.

Great post. Right Now we are working with pglogical and It is great.

Thanks.
Reply
Denish says:
May 1, 2017 at 6:46 pm

Thanks for detailed post.

Any chance logical replication will help to upgrade from Postgres 9.6 to Postgres 10? I’m sure it will require back patching 9.6 but I wanted to see if there is a plan?
Reply
- Petr Jelinek says:
  May 3, 2017 at 7:30 am
  
  The built-in replication requires at least PG10 on both sides so it will only help with PG10->PG11 upgrades and anything after that. If you want to upgrade from 9.6 to 10 via logical replication, your best bet is pglogical.
  Reply
proteusguy says:
May 3, 2017 at 3:22 pm

How’s this compare to BDR? Does logical replication mean support for multi-master? Also, are materialized views updated independently in each database or does an update of a materialized view on one db cause a replicated materialized view to also be updated?
Reply
- Petr Jelinek says:
  May 3, 2017 at 3:36 pm
  
  No there is no multi-master support in core. Materialized views are independent, in PG10 only normal tables can be published so far.
  Reply
  - mps_surcouf says:
    May 20, 2017 at 3:47 pm
    
    Can we setup bi directional replication by publishing and subscribing at each node
    Reply
    - craig.ringer says:
      October 13, 2017 at 7:52 am
      
      Yes.
      
      But it’ll break horribly if you make any schema changes while there are outstanding commits on the other node that don’t make sense with the new schema. Also, if you make any changes that create conflicting rows you’ll have to fix them manually since there’s no automatic conflict resolution. https://www.postgresql.org/docs/10/static/logical-replication-conflicts.html . There are more pitfalls too.
      
      In short, don’t. We have BDR for a reason, there’s a lot more to multimaster than pointing two DBs at each other.
      Reply
  - Ryan says:
    June 6, 2017 at 8:59 pm
    
    Is there a general idea of how multi-master might work in the future ? Just register a subscription on pointing to the other?
    Reply
    - craig.ringer says:
      October 13, 2017 at 7:43 am
      
      It’s not nearly that simple, due to issues around conflicts when rows change on multiple nodes, issues with schema changes, etc, etc.
      
      BDR is likely to serve as a fundamental model for the direction taken.
      Reply
Aaron Tate says:
May 3, 2017 at 9:49 pm

You said that the subscriber doesn’t pick up new tables added to the publisher, but are table alterations (ie: adding/removing columns, adding/removing/changing indexes) passed through on existing tables?

What happens when a source table is dropped? Does it drop the subscriber table or just stop sending updates?
Reply
- Petr Jelinek says:
  May 8, 2017 at 10:52 am
  
  There is no DDL replication in the built-in logical replication. So table alterations are not automatically propagated.
  
  If you drop source table it will just stop sending updates.
  Reply
Joe says:
May 24, 2017 at 4:11 pm

Is there an external api for this? A lot of services would benefit from replicating a subset of the main db but the example is just for another pg db.
Reply
- craig.ringer says:
  October 13, 2017 at 7:45 am
  
  Not yet. You could use the ‘pgoutput’ plugin, but you’d have to write code to consume the walsender protocol stream and the output plugin’s binary protocol. Or use it from SQL, but you still have to consume the output plugin protocol.
  
  Look into pglogical, which has a json output mode. Or wal2json, or numerous other tools.
  
  There’s active interest in this on the PostgreSQL hackers list. Get involved. Contribute, submit patches, test, etc.
  Reply
Benjamin says:
June 10, 2017 at 1:28 pm

What about sequences ?
Would “CREATE PUBLICATION alltables FOR ALL TABLES;” include them too ?
Reply
- craig.ringer says:
  October 13, 2017 at 7:43 am
  
  Sequence support was added after this post went live. See the docs in PostgreSQL 10 for details.
  Reply
liuqian says:
June 17, 2017 at 9:35 am

psql (PGXL 9.5r1.5, based on PG 9.5.6 (Postgres-XL 9.5r1.5))
Type “help” for help.

postgres=# CREATE EXTENSION pglogical;
ERROR: could not open extension control file “/usr/local/pgsql/share/extension/pglogical.control”: No such file or directory
postgres=# ! ll /usr/local/pgsql/share/extension/pglogical.control
sh: ll: command not found
postgres=# ! ls /usr/local/pgsql/share/extension/pglogical.control
/usr/local/pgsql/share/extension/pglogical.control
Reply
- craig.ringer says:
  October 13, 2017 at 7:41 am
  
  pglogical is not the same thing as the logical replication support in PostgreSQL 10.
  
  Also, Postgres-XL 9.5 is not PostgreSQL 10.
  Reply
jeremiah says:
July 27, 2017 at 3:19 pm

So, how do you feel this compares to Slony? Since it’s based off of WAL logs can we assume it will have less of a lag on high volume systems? Does this prevent writes to the subscriber tables? Is there a simple atomic way to change who the publisher is aka failover?
Reply
- craig.ringer says:
  October 13, 2017 at 7:39 am
  
  It doesn’t have as much of a write-amplification effect, especially with small transactions, and it should be considerably faster.
  
  It does not prevent writes to subscriber tables, but you can do so with regular PostgreSQL permissions. REVOKE rights to them.
  
  There is no failover support yet. Hopefully we’ll be able to address that in pg11 or pg12.
  Reply
Samuel Zarza Fernández says:
August 29, 2017 at 7:50 am

And what about sequences?
Reply
- craig.ringer says:
  October 13, 2017 at 7:28 am
  
  Support for sequences was added after this post was written. See the documentation.
  Reply
Marco says:
August 29, 2017 at 11:55 am

That is really a cool Feature. But I´m still wondering that so many information (we are using Beta 3) about the replication process is shown in the pgsql session. Is there any stop debugging output flag ?

Demo Output (every 2 seconds):
2017-08-29 10:55:24.060 UTC [10739] ERROR: duplicate key value violates unique constraint “test_tab_pkey”
2017-08-29 10:55:24.060 UTC [10739] DETAIL: Key (id)=(1) already exists.
2017-08-29 10:55:24.060 UTC [10739] CONTEXT: COPY test_tab, line 1
2017-08-29 10:55:24.060 UTC [10490] LOG: worker process: logical replication worker for subscription 16407 sync 16385 (PID 10739) exited with exit code 1
Reply
- craig.ringer says:
  October 13, 2017 at 7:28 am
  
  I’m not sure I understand the question.
  Reply
Craig Sabbey says:
January 29, 2018 at 9:42 pm

Can a standby server be a publisher?
Reply
- craig.ringer says:
  February 2, 2018 at 3:01 pm
  
  A physical replication hot standby cannot be a publisher, because logical decoding does not yet support operating from a standby.
  
  A logical replication subscriber may also be a publisher.
  Reply
Aigars says:
February 23, 2018 at 9:13 am

Hello!
I am trying to test logical replication on Windows for PostgreSQL 10.2 but on subscriber I sometimes get “requested WAL segment has already been removed” error.
It is strange, because due to replication slots it shouldn’t be happening. On master there is replication slot for slave and it shows correct wal file name (the one which is missing) in restart_lsn column.
Is there some kind of known bug or there should be some configuration issues?
Reply
- craig.ringer says:
  March 13, 2018 at 9:47 am
  
  Please raise that on the pgsql-bugs list with detailed information on logs, configuration, etc. This should not happen.
  Reply
Aigars says:
March 24, 2018 at 10:39 am

Hello!
I was wondering what is the best way for modifying replicated tables – for example added new column or removed column.
Do I need to pause replication or there is some other best practices for altering replicated tables?
Reply
Pavanteja says:
September 3, 2018 at 1:37 pm

Hi Aigars,

Even I’m wondering the same. For every table structure modification do we need to stop the logical replication?
As of now I’m dropping and recreating the subscription if in case any upgrade running on my application which changes table(s) structures.

The process which I’m following is as shown below:

On the Subscriber:
ALTER SUBSCRIPTION mysub DISABLE;
ALTER SUBSCRIPTION mysub SET(slot_name=none);
DROP SUBSCRIPTION mysub;

On the Publisher:
SELECT pg_drop_replication_slot(‘mysub’);

Then later on the Subscriber I will create the subscription again after the upgrade. Is it the correct approach?
If not please suggest me a better approach for stopping replication and again establishing it.

One more doubt that ruins all my mind resources is, what happens to the data in the subscriber already in sync with the publisher before stopping logical replication?

Will that be checked by logical replication worker again for the second time when we recreate subscription ?

Please advice. Looking forward to hear from you!!!

Regards,
Pavan
Reply
TriTuc says:
May 11, 2019 at 3:53 am

Hi,

Does logical replication support to replicate tables between diffenrent schemas ?

Thanks,
TriTuc
Reply
- craig.ringer says:
  May 16, 2019 at 1:10 am
  
  PostgreSQL’s 10 and 11’s built-in logical replication does not have a table mapping feature. It has been added to pglogical3 though, so it’s available to 2ndQuadrant customers now and to the wider open source community at a later date. I am not aware of anyone making efforts to add the same functionality to core PostgreSQL at this time.
  Reply
Hui Wen says:
August 27, 2019 at 9:53 am

Hi, I would like to ask whether logical replication have delay feature to delay the replication to the slave? If yes, how would I be able to do so? Thank you in advance.
Reply
- craig.ringer says:
  November 4, 2019 at 1:21 pm
  
  pglogical does. I don’t believe that an apply delay made it into PostgreSQL 10’s logical replication.
  Reply
pcpg says:
July 15, 2020 at 1:44 pm

Probably this is a bit of dumb question but I read that logical decoding was introduced in postgres 9.4. And logical replication as a feature is introduced in postgres 10. I was trying to understand the difference between “logical decoding” and “logical replication”. Would it be correct to say that “logical decoding” is a building block to “logical replication”?

Thanks
Reply
- Peter Eisentraut says:
  July 16, 2020 at 9:11 pm
  
  Logical decoding extracts logical change records from the WAL, so you have information like “this row was inserted” or “this row was updated” available via an API. To have a full logical replication you also need at least some kind of receiver process to fetch that data and write it into its local tables, and a protocol for the two sides to communicate, and a way to synchronize the initial table data, and then also some DDL to set all that up. Logical decoding just gives you an API to get the data to replicate, but you still need a whole bunch of infrastructure around it to actually have that data being moved around.
  Reply
Pratik Mehta says:
July 17, 2020 at 1:15 pm

Can someone please assist me. I am unable to locate pglogical file within my binary! 🙁
Also did not find any search results online.

-bash-4.2$ psql
psql.bin (10.6)
Type “help” for help.

postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
———–+———+————+—————————————–
adminpack | 1.1 | pg_catalog | administrative functions for PostgreSQL
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# CREATE EXTENSION pglogical;
ERROR: could not open extension control file “/opt/PostgreSQL/10/share/postgresql/extension/pglogical.control”: No such file or directory
postgres=# \! ls -ltr “/opt/PostgreSQL/10/share/postgresql/extension/pglogical.control”
ls: cannot access /opt/PostgreSQL/10/share/postgresql/extension/pglogical.control: No such file or directory
postgres=# \q
-bash-4.2$ find / -name pglogical.control 2>/dev/null
-bash-4.2$
-bash-4.2$ date
Fri Jul 17 18:43:52 IST 2020

How do I proceed to create the extension ?
Reply
Mahendiran says:
October 21, 2020 at 7:04 am

Hi
I am trying Logical replication to take snapshot of data from Master Server to many Client server.
I need to know when the Intial snapshot is completed from Master Server.

I am using PG logical replication t sync up the intial data . I have tried the pg_state_replication and pg_state_subscriptions views , But I cant find exact point at the Inital snapshot is taken. I want to kill the
Initial snapshot after the backup.

Please help if you can .
Reply