How bulk loading with COPY can be 300% times faster in Postgres 16 Beta 1
In episode 70 of “5mins of Postgres” I want to talk about Postgres 16 Beta 1 being released and how one of its improvements can allow you to bulk load about 300% times faster with COPY. The reason for this being an improvement to relation extension locks.
Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and subscribe to our YouTube channel.
Transcript
Let's jump in!
Getting started with Postgres 16 Beta 1
Last week, the first beta release of Postgres 16 was officially announced. This means this is a good time to start testing Postgres 16 to see if the new features contain any bugs, or if you feel that there's something, for example, not well written in the documentation that could use clarification.
Note that Postgres 16 will probably come out officially in September or October. So it's still a couple of months until then, but this is really the time now to give feedback to the community and to the patch authors so that they can incorporate your feedback ahead of the release. If you want to test out Postgres 16, you can follow the instructions on the Postgres website, or for some providers like Amazon RDS, they also offer Postgres 16 Beta 1 in their preview environments, so you can try it out that way.
Looking at what is included in Postgres 16, we talked about quite a few features already in previous 5mins of Postgres episodes, I’ll link them at the end of this article. But, if you're curious about a summary you can look at the release announcement here, or you can also look at the release notes in the Postgres documentation.
Speeding up Postgres performance in PG16 Beta 1
I wanted to spend today talking a little bit more about one of the features I haven't previously looked at closer, which is mentioned in the release notes as “improving the performance of the concurrent bulk loading of data using COPY up to 300%”. This doesn't always apply, but in some cases you will see your COPY
being much faster in Postgres 16.
Now, what stands behind that sentence? When you look at the release notes this is a bit more hidden. In the release notes, this is related to the item that says "allow more efficient addition of heap and index pages".
This means that this has to do with copying data into a table and the file on disk becoming larger as you're copying data in.
This is a patch that Andres Freund committed a couple of weeks ago, shortly before the feature freeze. The background on this is a longer mailing list thread, but the context here is that Andres and team have been working on asynchronous I/O in Postgres. And asynchronous I/O is not going to make it in Postgres 16, there were a couple of patches that relate to it like this one, but Postgres 17 is going to be the first release where there is actually going to be interesting, I would say, async I/O benefits.
Improving relation extension locks in Postgres 16 Beta 1
However, this particular patch here tried to resolve one of the particular bottlenecks that show even today and that don't actually need async I/O, they just need a better way of handling what's called relation extension locks.
Today, when you're adding data into a table and you're creating new pages in the table, each time that Postgres is adding a new page it has to hold this relation extension lock. And there's a lot of work that used to be done whilst holding that lock. As Andres describes in his mailing list thread, it’s really way too much work whilst holding that lock. The refactoring work that Andres worked on in Postgres 16 relates to reducing the time that relation extension lock is held.
New infrastructure for extending relations
Now, there are two main aspects of how that is done. First of all, new infrastructure for extending relations in a particular way. "Relation extension" means you're increasing the size of the table on disk. Previously, that was, in a sense, a rather simplistic code path. There are now better code paths to have a bulk extension of multiple pages that need to be created.
Extending tables more efficiently
The second commit then makes use of that new infrastructure to be able to extend these tables more efficiently. In some cases where you're seeing that bottleneck on that relation extension lock, you will see a significant improvement in the performance with Postgres 16 Beta 1.
How to find out if you can benefit from this
If you want to know whether there is a chance that you will benefit from this once Postgres 16 comes out: when you're doing these types of bulk loads with COPY
watch out for the wait event of wait event type "lock" and then the lock wait event "extend". This is what would currently be triggered if you're running into this being a bottleneck. What you would see is in your pg_stat_activity
. You would see the COPY
that's trying to insert data, always being busy with that type of wait event versus other wait events.
That way you would know that you're bottlenecked on that particular relation extend lock and in Postgres 16 you may see up to that 300% improvement.
There's a lot more things that are new in Postgres 16, I would highly recommend you take a look at the beta release, help test, and then make sure that we get a good official release candidate and GA later this year.
Thanks for joining us for E70 of 5mins of Postgres. Subscribe to our YouTube channel, sign up for our newsletter and follow us on LinkedIn and Twitter to get updates about new episodes!
What we have discussed in this episode of 5mins of Postgres
-
PostgreSQL 16 Beta 1 is now available in Amazon RDS Database Preview Environment
-
refactoring relation extension and BufferAlloc(), faster COPY - mailing list thread
-
bufmgr: Introduce infrastructure for faster relation extension - Commit by Andres Freund
-
hio: Use ExtendBufferedRelBy() to extend tables more efficiently - Commit by Andres Freund
-
Postgres 16: Running EXPLAIN on any query (even with $1 parameters!)
-
Postgres 16: Buffer cache hit ratio and I/O times in pg_stat_io
-
Vacuum Cost Limit and Parallel Aggregate improvements in Postgres 16
-
Postgres 16: Surviving without a superuser & reserved_connections
-
Using pgbench to load data faster, and the random_normal function in PG16