How bulk loading with COPY can be 300% times faster in Postgres 16 Beta 1
In episode 70 of “5mins of Postgres” I want to talk about Postgres 16 Beta 1 being released and how one of its improvements can allow you to bulk load about 300% times faster with COPY. The reason for this being an improvement to relation extension locks.
Let's jump in!
Last week, the first beta release of Postgres 16 was officially announced. This means this is a good time to start testing Postgres 16 to see if the new features contain any bugs, or if you feel that there's something, for example, not well written in the documentation that could use clarification.
Note that Postgres 16 will probably come out officially in September or October. So it's still a couple of months until then, but this is really the time now to give feedback to the community and to the patch authors so that they can incorporate your feedback ahead of the release. If you want to test out Postgres 16, you can follow the instructions on the Postgres website, or for some providers like Amazon RDS, they also offer Postgres 16 Beta 1 in their preview environments, so you can try it out that way.
Looking at what is included in Postgres 16, we talked about quite a few features already in previous 5mins of Postgres episodes, I’ll link them at the end of this article. But, if you're curious about a summary you can look at the release announcement here, or you can also look at the release notes in the Postgres documentation.
I wanted to spend today talking a little bit more about one of the features I haven't previously looked at closer, which is mentioned in the release notes as “improving the performance of the concurrent bulk loading of data using COPY up to 300%”. This doesn't always apply, but in some cases you will see your
COPY being much faster in Postgres 16.
Now, what stands behind that sentence? When you look at the release notes this is a bit more hidden. In the release notes, this is related to the item that says "allow more efficient addition of heap and index pages".
This means that this has to do with copying data into a table and the file on disk becoming larger as you're copying data in.
This is a patch that Andres Freund committed a couple of weeks ago, shortly before the feature freeze. The background on this is a longer mailing list thread, but the context here is that Andres and team have been working on asynchronous I/O in Postgres. And asynchronous I/O is not going to make it in Postgres 16, there were a couple of patches that relate to it like this one, but Postgres 17 is going to be the first release where there is actually going to be interesting, I would say, async I/O benefits.
However, this particular patch here tried to resolve one of the particular bottlenecks that show even today and that don't actually need async I/O, they just need a better way of handling what's called relation extension locks.
Today, when you're adding data into a table and you're creating new pages in the table, each time that Postgres is adding a new page it has to hold this relation extension lock. And there's a lot of work that used to be done whilst holding that lock. As Andres describes in his mailing list thread, it’s really way too much work whilst holding that lock. The refactoring work that Andres worked on in Postgres 16 relates to reducing the time that relation extension lock is held.
Now, there are two main aspects of how that is done. First of all, new infrastructure for extending relations in a particular way. "Relation extension" means you're increasing the size of the table on disk. Previously, that was, in a sense, a rather simplistic code path. There are now better code paths to have a bulk extension of multiple pages that need to be created.
The second commit then makes use of that new infrastructure to be able to extend these tables more efficiently. In some cases where you're seeing that bottleneck on that relation extension lock, you will see a significant improvement in the performance with Postgres 16 Beta 1.
If you want to know whether there is a chance that you will benefit from this once Postgres 16 comes out: when you're doing these types of bulk loads with
COPY watch out for the wait event of wait event type "lock" and then the lock wait event "extend". This is what would currently be triggered if you're running into this being a bottleneck. What you would see is in your
pg_stat_activity. You would see the
COPY that's trying to insert data, always being busy with that type of wait event versus other wait events.
That way you would know that you're bottlenecked on that particular relation extend lock and in Postgres 16 you may see up to that 300% improvement.
There's a lot more things that are new in Postgres 16, I would highly recommend you take a look at the beta release, help test, and then make sure that we get a good official release candidate and GA later this year.