Postgres INSERT ON CONFLICT and how it compares to MERGE in Postgres 15

By Lukas Fittl

March 31, 2022

Today, in episode 11 of our series, we are taking a look at the MERGE command and how it differs from Postgres' INSERT ON CONFLICT command.

Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and check out the newsletter archive and subscribe to our YouTube channel.

MERGE in Postgres 15
- How the MERGE command works in Postgres
Postgres' INSERT ON CONFLICT command: Why does it exist?
What we have discussed in this episode of 5mins of Postgres

Transcript

Let's get started.

MERGE in Postgres 15

Earlier this week, the MERGE command was merged into the Postgres 15 branch. MERGE, for those of you who are not familiar with it, is a SQL standard command that allows you to take certain data and merge it with a table and then updating or inserting or deleting values in that table.

Other databases like SQL Server or Oracle have had MERGE for a while, but Postgres chose to not add MERGE a couple of years back. Instead, it added an alternate command called INSERT ON CONFLICT.

Now, the community has spent more time and you'll actually see that in a new patch that got accepted, there were multiple authors and a lot of people have reviewed this as well. There is always a risk that these things don't make it into the final release, but I think this is the best chance that MERGE has had yet to make it into Postgres.

How the MERGE command works in Postgres

Let’s take a closer look at the MERGE documentation that's already available on the development branch to see the way that MERGE works.

Essentially, you say MERGE INTO, and then you specify the table name.
- This is the standard SQL syntax, so other databases would work similarly.
Then you say USING a particular data source.
- This data source could be a different table, it could also be a query. For example, a query here, could be a VALUES clause, where you have a set of values you're trying to insert into the table, and then you're trying to decide what to do if the value already exists.
- For each of those input rows you are specifying the join_condition. The join_condition helps you match the input data, the data source, against the target table.
Then, based on whether there is an existing match on the join_condition or not, you can decide to do certain things.
- You can say WHEN MATCHED, and then some other condition, for example, then, perform an UPDATE, perform a DELETE, or DO NOTHING. And similarly, WHEN NOT MATCHED, you can also choose to INSERT the data, which will often be a use case, or DO NOTHING.

Pretty versatile syntax here, and: these are the standard INSERT and UPDATE commands that you can utilize. I would say there's a lot of use cases I can see for this. The easiest use case, of course, is, if there is existing data, you can just make sure that you INSERT or UPDATE. The other thing you can do with this is you can also use it to, for example, update a certain balance. The example in the above mentioned docs is: There is a customer account table, you have the recent transactions table and then you're updating the account balance, as you process that data. Then you say when it's matched: you’re increasing the balance. When it's not matched: you’re inserting a new row into the customer account table with that transaction as the starting value.

There's different ways of writing these queries, but in general, it's very useful syntax. I really hope that this makes it into the final release.

Postgres' INSERT ON CONFLICT command: Why does it exist?

Now, what I did find interesting is: Why didn't Postgres have that a long time ago? Postgres 9.5 added a different syntax called INSERT ON CONFLICT. In the MERGE documentation you can see where it says:

You may also wish to consider using INSERT ON CONFLICT as an alternative statement which offers the ability to run an UPDATE if a concurrent INSERT occurs. There are differences and restrictions between those two statement types and they are not interchangeable.

I think that's very important to know, that there are trade offs for both of these, and you may want to use INSERT ON CONFLICT when you don't want to use MERGE.

You might wonder what such situations are. Peter Geoghegan, who was the initial author of the INSERT ON CONFLICT feature, actually made a very helpful comment here on Twitter when people were talking about this.

The comment that he essentially made is that the downside of MERGE's handling of concurrency is that when you concurrently INSERT, so at the same time as you're executing the MERGE statement, there is another INSERT going on, then MERGE might not notice that. MERGE would go into its INSERT logic, and then it would get a unique violation.

I recall that, back when he initially designed the INSERT ON CONFLICT feature, that because of this constraint of how MVCC works in Postgres, you cannot implement MERGE with its general syntax and options to work in the same reliable way. If you want the generality of MERGE, you have to accept the fact that you might get unique constraint violations, when there are concurrent inserts, versus with INSERT ON CONFLICT, the way it's designed with its speculative insertions, guarantees that you either get an INSERT or an UPDATE and that is true even if there are concurrent inserts. You might want to choose INSERT ON CONFLICT if you need the guarantee.

Thanks so much for joining us for this episode of Postgres. Subscribe to our YouTube channel, sign up for our newsletter and follow us on Twitter to get updates about new episodes!