5mins of Postgres E37: New in Postgres 15: Logical replication column filters & database-wide ICU collations
Today, we talk about new features in Postgres 15, including logical replication improvements and database wide ICU locales.
In case you're not aware, Postgres 15 rc1 is around the corner and we can expect the final release of Postgres 15 within the next couple of weeks.
Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and check out the newsletter archive and subscribe to our YouTube channel.
Transcript
Let's jump in!
Using column lists in logical replication to filter data
In this blog post, Vignesh from the Fujitsu team describes how you can use column lists in logical replication to restrict which data gets sent into the logical replication stream.
The essence of the feature is that, when you have a CREATE PUBLICATION
statement on the primary, you now can actually specify a list of columns from the table that should be included in the publication.
One important thing to know here is that you will at least need to include all the REPLICA IDENTITY
columns in the publication. If they weren't included, UPDATE
or DELETE
operations wouldn't have a reference to attach to. This also means if you're using REPLICA IDENTITY FULL
, the new feature doesn't really help you because you can't specify a subset of a column list because all the columns are important, or part of the REPLICA IDENTITY
.
Now, the way this works is, for example, we can say CREATE PUBLICATION
for table “employee”, and we can say "id", "name" and "email" as the columns. In this case, other columns, like “date of birth” or which “department” they're in, would be skipped. On the subscriber, you could actually have these columns existing, but they would need to be nullable.
Another example Vignesh walks us through in his article is the the case of a table called “student”. You can see the different columns being passed and, in this situation, the subscriber actually does not have these columns existing at all, so you now can have a table definition on the subscriber that differs from the publisher's definitions, and that works as expected.
You now can have a table definition on the subscriber that differs from the publisher's definitions, and that works as expected.
Skipping data to reduce network traffic, and data relevancy
This is actually very useful, because previously you always had to keep a lot of unnecessary data around. Additionally, it will help you provide only relevant data to the subscriber. Oftentimes, you're only analyzing a subset of your table's information. This will reduce network traffic, so you can now send less data over the network as part of the replication stream. If you have a very large column that you don't need, you can now skip over that.
Last but not least, sometimes there are security impacts to copying data around. For example, if you had a "users" table, you may not want to copy around your hashed password. But, it can be helpful to have a user's emails in different parts of your environment. Now, with the new logical replication column filters, you can say only user's "id", "email" and skip over all the other user columns.
Overall, I think this will make logical replication a lot more useful for more use cases and avoid manual replication tooling that people use today.
Database wide ICU collations in Postgres 15
The second feature I want to talk about are database wide ICU collations in Postgres 15. Peter Eisentraut wrote a very good article about it here. ICU support was added back in Postgres 10, and if you're not familiar, ICU collations matter, for example for sorting. When you are sorting the rows in a table, you do this according to certain rules, and these can be language specific. In German, for example, you might have the German umlauts and they will influence how certain things are ordered.
When you are sorting the rows in a table, you do this according to certain rules, and these can be language specific.
Traditionally, this was done using glibc, and glibc, or libc, had various issues around different versions causing potential database corruption because they weren't really explicit about when the sorting changed between different library versions.
ICU is much more featureful and it also has more explicit versioning around when it changes. Now with Postgres 15, we can use a certain ICU locale for the whole database, making sure that the ICU versioning that also influences how things are sorted, now applies consistently throughout your database.
There are actually still some cases where you do have to use the libc locale, these are some leftovers, for example, in the text search code. So be aware that today, in Postgres 15, whilst you can default to the ICU locale for essentially most of the database you still have to specify a libc locale at this point.
This is actually something that Peter, who has worked on these features in Postgres, is going to address in future releases.
Overall, I think for a lot of use cases, this will make ICU more widely available and more useful to people.
Thank you for joining today's episode of 5mins of Postgres. Subscribe to our YouTube channel and follow us on Twitter to hear about next week's episode!