Vacuum Cost Limit and Parallel Aggregate improvements in Postgres 16

In episode 63 of “5mins of Postgres” we look at two new exciting features in Postgres 16: Updating the cost limit on the fly and using aggregate functions in parallel.



Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and subscribe to our YouTube channel.


Transcript

Let's have a look!

Postgres 16: Enhancements for vacuum and parallel aggregate

As of last Saturday (April 8th, 2023), the feature freeze for Postgres 16 has begun. This means that now no new features can be added to Postgres 16 without an explicit exception from the release management team.

This also means that we're on track for a beta release sometime around the end of May and hopefully the final actual Postgres 16 release roughly in the usual September to October timeframe! That, of course, is dependent on testing. If you're eager to test Postgres 16, it's starting to become a good time to do that. Maybe wait a few more weeks for a few things to stabilize, but we're getting close to the point where it's good to give the community feedback on the new functionality!

There are two new features that I want to talk about today.

Updating vacuum cost limit on the fly in Postgres 16

The first feature is updating the cost limit on the fly. This is a patch by Melanie Plageman that got committed shortly before the feature freeze. What's great about this change is that it will help many people that run large production databases when they are in a pinch because of autovacuum problems.

Usually, when you have autovacuum configured, it can happen that your autovacuum cost delay is not aggressive. This was very common on older Postgres releases before Postgres 12, which was when the default cost limit got changed to be essentially 10 times as resource consuming, which makes sense on modern hardware.

Avoid stopping VACUUM and having to start a manual VACUUM

But even now, on really large database servers, you might run into a problem where your VACUUMs can take hours or days. With this change in Postgres 16, the one thing you can do, which makes a huge difference in practice, is that you can change the cost limit setting whilst the VACUUM is running. The change will actually take effect right away whilst it's running! This means you don't have to stop the VACUUM and start a manual VACUUM. You can actually get autovacuum to pick up that change and become faster. Which, when you're close to transaction ID wraparound because of some configuration issues with freezing for example, then this really helps you avoid waiting on another VACUUM run, and instead just speeding up the one that's already running. I'm really excited about having this as a tool when you need it, because it really can make a big difference.

This means you don't have to stop the VACUUM and start a manual VACUUM. You can actually get autovacuum to pick up that change and become faster.

Using two aggregate functions in parallel query in Postgres 16

The other change that will be really useful is the ability to use two very commonly used aggregate functions in parallel query.

Some background first: This is a blog post by Pavlo Golub on the Cybertec blog. In it, he talks about how aggregates in Postgres function. You can think of aggregates such as sum, average, max or min. When you are running such an aggregate in parallel, what that means is that Postgres is actually able to break up the resultset into smaller pieces. Instead of doing a sum across everything, it's actually able to do a sum of a part of the whole query, and then in the end combine it together.

This so-called partial mode is a particular feature that needs to be supported for aggregate functions. If your aggregate functions don't support partial mode, you can't actually make use of this. Most importantly, this was true up until Postgres 16, for array_agg and string_agg. This commit, that was added in Postgres 16 by David Rowley, actually added support for these aggregates to become a parallel aggregate.

Download Free eBook: How To Get 3x Faster Postgres

Partial mode enabling parallel plans

Let's take a look at how this looks like in practice and follow along in Pavlo’s blog post: You can just generate a test table, insert some data, and then in his particular case, he was making it very appealing for the planner to pick a parallel plan. What this ends up doing there is that it’s pulling data from that table and it’s just grouping by one of the columns, and then it’s doing a string aggregate, an array aggregate, on the other columns. What you can see is that because of this new support for partial mode for these aggregates, you now can actually get a parallel plan. Previously, Postgres wouldn't have been able to do a parallel plan like this.

Because of this new support for partial mode for these aggregates, you now can actually get a parallel plan.

Let’s get back to Pavlo’s article. You can see, it’s first doing a sequential scan in each worker, then it's doing a partial hash aggregate in each worker, and then, afterwards, it's merging that data together. It's able to parallelize both the sequential scan and the aggregate function, which obviously can be a big performance difference because you're doing a lot of the work in parallel.

On older Postgres releases that is not the case. You would see the parallel sequential scan being in the parallel query, but the actual aggregate would happen outside of the parallel query, after the results are merged. Obviously in many cases this can end up being slower.

Overall, I'm really excited about Postgres 16. There's a lot of other things that got committed recently, and I'm sure we'll hear more about that in the next coming weeks.

Thanks for joining us for today’s episode 63 of 5mins of Postgres. Subscribe to our YouTube channel, sign up for our newsletter and follow us on LinkedIn and Twitter to get updates about new episodes!

What we have discussed in this episode of 5mins of Postgres


Enjoy blog posts like this?

Get them once a month to your inbox