5mins of Postgres E26: New in Postgres 15 - NULL handling in UNIQUE constraints & Parallel DISTINCT
Today, we are talking about two new features in Postgres 15. First of all, we're going to talk about the new unique constraint NULLS NOT DISTINCT option, and then we'll talk about the SELECT DISTINCT performance improvements.
Let's jump in.
First of all, in this post by Ryan Lambert on the RustProof Labs blog, he talks about the new feature for unique constraints. If you don't know, Postgres 15 is currently in beta season, beta two was recently released. So it's a great time to test things. In this post here, he details the change that Peter Eisentraut made in Postgres 15 to allow unique constraints and indexes to treat NULL values as not distinct.
Ryan goes into more details of what that means. For this, he has two different tables that he's looking at. There's one table which he called "null_old_style", that's just using a standard unique constraint on two columns.
Separately, you'll also note here that for the "id" column, that's not related to the constraint, he's using the newer syntax here that is GENERATED BY DEFAULT AS IDENTITY, instead of using a sequence in Postgres. If you are on a modern Postgres version, this is actually the preferred style to create an auto incrementing value in Postgres.
The second table that Ryan creates here is a "null_new_style" table. This is the exact same schema, except the unique constraint here uses this new NULLS NOT DISTINCT value. What that does is it treats NULL values in the unique constraint differently. Now the "val1" that's a NOT NULL column, you couldn't actually insert NULL values into that column, but for the "val2" column the difference here is that in this second table, if you insert a NULL value into this nullable column, then it will be treated differently.
In Postgres 14 and older versions unique constraints always treat NULL values as not equal to other NULL values. If you're inserting a NULL value into a table and you have a unique constraint, the NULL value is considered to be distinct on its own. NULL is always different from another NULL. When you're inserting five records into the "old_null_style" table where "val1" is just always the same value "Hello" and then "val2" is always NULL.
Even though you have a unique constraint that actually supports you inserting that five times or as many times as you'd like, because you have that NULL value that makes each row distinct from another and because the unique constraint includes both "val1" and "val2", all the rows are unique.
With this new NULLS NOT DISTINCT option, if we insert into that other table that has that new option enabled, you can see that we can insert once without problems, but then if we insert a second time, so if we try to insert the exact same row with the same NULL a second time, then you'll actually get a unique constraint violation. This will tell you, we already have this value in the table, it rejects it.
If you were to change the "val1", if we were inserting "World" instead of "Hello", then you could actually insert a second row, and you would see that both of these here are NULL but the reason that you are allowed to insert this again then was because the first value here was different.
This is pretty useful in some cases, on the Hacker News discussion about this blog post, Peter Eisentraut, who wrote this feature, gave some more context, and the background to know here is that the SQL standard was ambiguous about how this is supposed to work, the way that Postgres chose to do this historically was one way of choosing it, and as Peter notes here, he still thinks that that's the right way to think about it.
But then the upcoming SQL standard makes this more explicit and says this behavior is implementation defined, and then there is this new NULLS NOT DISTINCT option that allows you to pick the behavior. The great thing here is, going forward, you can expect other databases to also adopt that new SQL standard and then you'll have a consistent way of defining which behavior you want. If you want NULLS DISTINCT or NULLS NOT DISTINCT for your constraints.
Now another change, in Postgres 15 was described by Robert on the Percona blog. This change is about the SELECT DISTINCT feature. In older Postgres versions you'll see that Postgres generally got a little bit faster here, but the major change here in Postgres 15 is that Postgres can now use the parallel query option as well for queries that have DISTINCT. You can see here, there was a pretty nice improvement on that. In a query plan that shows up as a Gather node which is used in parallel query , you can see that it does this grouping on for example, the 10 columns here. The nice thing is that's an automatic change, you don't have to do anything to get that improvement, once you upgrade to Postgres 15 and you have parallel query enabled, this will automatically apply to your database.