Waiting for Postgres 16: Logical decoding on standbys
In E66 of “5mins of Postgres” we're talking about logical decoding on a standby in Postgres 16, which allows us to keep following the logical replication stream whilst a failover happens. We’ll also touch upon the new pg_log_standby_snapshot function.
Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and subscribe to our YouTube channel.
Let's jump right in!
Using logical decoding on standbys in Postgres 16
This is a blog post by Bertrand from the AWS Postgres Contributors team. In this blog post, he describes a feature that he co-authored in Postgres 16.
Before we jump into this I'm going to point out that this feature has been worked on for a long time. Back in 2018, Andres Freund started a thread about minimal logical decoding on standby, which aimed to achieve what now in Postgres 16 is reality.
Now, shortly before the feature freeze for Postgres 16, this feature was pushed. "Pushed" means this is in Postgres 16 as of right now, but because Postgres 16 still will undergo multiple months of testing, there is of course a small risk that this won't make it, but I think chances are good.
What Bertrand describes in his blog post is how this new feature works and how you can benefit from it.
logical decoding can not be used while in recovery
First of all, before this commit, when you tried to create a logical replication slot on a standby server in Postgres, Postgres would actually give you an error and say
logical decoding can not be used while in recovery.
Logical decoding, just to be clear, is what's used underneath the hood of logical replication and what also often powers other processes that extract data from Postgres, for example to use in the application, or when you're doing transfers to other systems.
Create a logical replication slot in Postgres 16
With the change in Postgres 16, when you're on a standby, you can actually create a logical replication slot. There are a few mechanics around this that make this work slightly differently than if you were creating a logical replication slot on the primary.
Most importantly, you might be waiting a little bit longer for the slot to actually be created. To speed that up, you can run the new
pg_log_standby_snapshot function on the primary. What this does behind the scenes is that it produces a special WAL record that is required on the secondary. There's a whole other story around this if you really want to dive deep.
But the point is, if you're seeing yourself wait a long time on the logical replication slot to be created, call this function to speed it up on the primary.
If you're seeing yourself wait a long time on the logical replication slot to be created, call this function to speed it up on the primary.
What is logical decoding on standby?
Let's take a look at Betrand’s example: We have a table called
bdt that just has a single column. We can see the logical decoding message
INSERT, we can see the data that was being inserted. This is fairly standard, but what is special about this is that you're running all of this whilst you're on a standby.
Following the logical replication stream whilst a failover happens
Now, the really awesome thing about this is that you can use this whilst the standby gets promoted. Let's say you run the
pg_recvlogical command, which outputs the replication stream in a separate process. Now, let's say you're promoting the standby to be the new primary. And what's amazing is that you could actually observe that the
pg_recvlogical process has not been interrupted. It can keep following the logical replication stream whilst the failover happens.
Seamlessly follow failovers and get data
Second, in the newly promoted primary, we can now insert data and you can see that the
pg_recvlogical process that was reading the replication stream seamlessly got that data.
You can imagine that if they're subscribers, they can keep having an open connection to the standby and if the standby gets promoted, they can seamlessly follow without having to worry about reset connections or making any other changes.
This also works on logical replication
There's one more thing to note here: This doesn't just apply to decoding, but also to logical replication! You create a table, and then it's important to note that you would have to create the publication on the primary because publications in Postgres are catalog objects, you couldn't create those on the secondary. Then you create a subscription, but instead of getting an error, you will now simply get
created replication slot – and it will just work!
I think this really enables a lot of new things in Postgres because previously logical decoding could become a bottleneck. You can imagine everything needing to happen on the primary means that actually reading the WAL for logical decoding and all that work had to happen on the primary, puts a limit to how much decoding you can do. Also, just the fact that this can seamlessly follow failovers will enable more resilient architectures that rely on logical replication or logical decoding throughout how they're built.
There are a few things to understand in certain conditions. Most importantly, if you're running
hot_standby_feedback off, this will actually be a situation where there is a higher chance of certain conflict scenarios where ultimately the replication slots on the standby would be invalidated.
Just be aware that in some cases you might see slots being invalidated. It’s important to be able to check for that and to actually respond to that.
All in all, a really important building block for more sophisticated architectures that rely on Postgres, and I look forward to using it in Postgres 16.
Thanks so much for joining us for E66 of 5mins of Postgres. Subscribe to our YouTube channel, sign up for our newsletter and follow us on Twitter to get updates about new episodes!
What we have discussed in this episode of 5mins of Postgres
Postgres 16 highlight: Logical decoding on standby - by Bertrand Drouvot
Minimal logical decoding on standbys - by Andres Freund on postgresql.org
Add info in WAL records in preparation for logical slot conflict handling - on postgresql.org
Waiting for Postgres 16: Running EXPLAIN on any query (even with $1 parameters!)
Waiting for Postgres 16: Buffer cache hit ratio and I/O times in pg_stat_io
Waiting for Postgres 16: Cumulative I/O statistics with pg_stat_io
Vacuum Cost Limit and Parallel Aggregate improvements in Postgres 16
Postgres 16: Surviving without a superuser & reserved_connections
Using pgbench to load data faster, and the random_normal function in PG16
5mins of Postgres E34: Making the most of Parallel VACUUM and why SQL/JSON was pushed out to Postgres 16
5mins of Postgres E27: Waiting for Postgres 16 - Make subquery alias optional in FROM clause