Tuning random_page_cost and how index correlation affects query plans

By Lukas Fittl

March 08, 2024

In today’s E105 of “5mins of Postgres” we discuss why changing random_page_cost from the default of 4 is usually a good idea, and a specific example of where a high random_page_cost caused a bad plan due to index correlation. We dive into the relevant parts of the Postgres source, and explain how planner costing works.

Share this episode: Click here to share this episode on LinkedIn or on X/Twitter. Feel free to sign up for our newsletter and subscribe to our YouTube channel.

A puzzling index scan with Rows Removed By Filter
- The impact of changing random_page_cost
Diving into the Postgres planner
Index correlation
Best practices for tuning random_page_cost
In conclusion
What we have discussed in this episode of 5mins of Postgres

Transcript

We'll start with this blog post by Shayon Mukherjee on his personal blog. And in this post Shayon describes a particular situation that they encountered on their production system, where they were optimizing queries and they were able to achieve a hundred times faster queries by tuning the random_page_cost setting. Many of you I'm sure have actually tuned this before, because in most environments that I've seen where the default of random_page_cost = 4 is being used, it actually does make sense to lower that to a lower setting.

But today, I want to dive a little bit more into the specific situation that Shayon encountered, because I think it is actually something that's a little bit more puzzling than the usual situation.

And then I also want to take a look at how the Postgres planner uses these cost settings and how we can reason about adjusting them. Let's go look a little bit closer at the situation at hand.

A puzzling index scan with Rows Removed By Filter

Shayon was comparing two different query plans. We can see here in this example, we have Nested Loop join between two tables. The first table just returns 2 rows, and the second table has LIMIT 500 and we're getting 500 rows from that second table, but there might be more matching the condition.

Now the current index that's being used is the primary key index on that second table. The starting point here is that we're seeing 11 seconds of runtime. If you look at this plan, one thing that you would probably be noticing is that there is a lot of "Rows Removed by Filter".

Even though we're doing an index scan, We're seeing that the index scan, doesn't reduce down based on a particular index condition, the WHERE clause isn't really being used here. Instead the index is being used for its ordering, so the index is being used for the fact that it returns the primary key in order, and then Postgres can use that to essentially load the data successfully from the main table.

The other thing I want to mention here is that we can see how Postgres estimated the cost for this. And so Postgres, if we were to retrieve all the data estimated that we'd have cost 20,000 here, but because we had this limit, it only estimated a cost of 35 for each of those two loops. So you can think of the total the cost for that side of the loop would be roughly 70 cost units.

Let's look at what the change to random_page_cost did.

The impact of changing random_page_cost

Shayon changed the random_page_cost from the default of 4.0 cost units, to be 1.1 cost units. And this actually changed the plan quite significantly. Because first of all, on that first table where previously we were getting a sequential scan, now we're getting an Index Only Scan. That doesn't actually make the difference in this example, it's nice to see an Index Only Scan, but that doesn't really benefit us much here.

It's really that second table where we're now seeing that instead of using the primary key index, it's using an index where it matches the WHERE condition and through that it's able to use the index to look up everything.

And so then you can see in the second table, it's able to use the index to filter down on the index entries that are only relevant for the particular IDs we're looking for, and that match that completed IS NULL condition.

Diving into the Postgres planner

Now, what does that actually mean? Why did Postgres change it's query plan here, based on us adjusting this random_page_cost setting?

A recap on Postgres cost units

First of all, the Postgres documentation has a good explanation on the cost units. Before we dive into how the `random_page_cost`` setting works, I want to make sure you understand how these cost units work.

When we have a plan, we can look at the size of this table and we can see this table has 10,000 rows, hence this estimate here returning ten thousand rows. And then it has 358 disk pages. Now, the math here for a Sequential Scan is fairly simple. Postgres will say, "How many disk pages do we have to read?" It will multiply that by the seq_page_cost, which is used for sequential scans. And then it will also add a little bit of a cpu_tuple_cost.

The planner calls cost_index

Now, if we look at the Postgres source, when the planner comes up with different alternate plans it will call the functions in costsize.c in order to estimate how much cost a particular scan will take.

Here we can see in the cost_index function, amongst other things what the planner will do is, it will actually delegate this decision to the particular index access method. Most of the time in Postgres you'll have a B-tree index, but if you have a GIN index, for example, it has a very different cost pattern than a B-tree than a Hash and such. And so it's good to know that just because something is true for B-tree doesn't mean it's true for other index types.

btcostestimate costing for B-tree indexes

Now, if we look at the btcostestimate function, you can see that this will do certain things that are very specific to B-trees, so, for example, with B-trees, only leading equality comparisons impact how much of the index we're scanning?

Now, ultimately what this ends up doing is it ends up calling this genericcostestimate function. And it passes in the number of index tuples, aka index entries that Postgres thinks it will have to read. And so if we look through here, in the simplest case what it actually does for an index scan, is literally just saying number of index entries we'll have to get, times the random_page_cost. This is fairly straightforward. A lower random_page_cost setting here, means that for the same amount of index pages, the index costs will get lower.

Index correlation

But now the real question is: if this random page cost here is essentially the same for each index, why did we get the choice of one index in Shayon's plan with random_page_cost = 4.0, and the choice of another index with page cost 1.1?

I'm guessing a little bit here because I don't have access to the original system, but I think it actually comes down to the fact that Postgres thought that the index result was correlated with the table.

And it will be able to end the scan early because it thought it would match more conditions, through looking at the table for the matching IDs.

And you can see that the cost_index function at the later parts of the function actually tries to estimate how many of the main table pages, not the index pages, how many of the main table pages it will have to fetch. And you can see here, it will actually take into account the sequential page cost, not just the random page cost.

And so if Postgres estimates that you have correlation between the index result and the table, it might think that if the difference between random_page_cost and seq_page_cost is large, it might make more sense to essentially scan the index, and then scan the table, versus relying on the index to get that result.

And so I think really this comes down to Postgres incorrectly thinking that it doesn't have to look at 150,000 rows to be able to get the matching results. It thought it would have to get a lot less rows from the main table to be able to fulfill that LIMIT 500 that was being asked for.

Best practices for tuning random_page_cost

And so I think it's a bit hard to reason about, but what I would probably do is if I was in this situation besides testing "random_page_cost", I would also try to understand, what are the actual cost units.

If you don't change the random_page_cost and use something like pg_hint_plan to get the better plan, which you think it should be using, what is this cost value here that Postgres is estimating, and also how is that impacted by a LIMIT being in place or not?

I would argue that in most systems, including most systems in the cloud, it's pretty safe to set random_page_cost to something that's much lower than 4. It should still be higher than sequential page cost.

And that really comes down to how the planner was written. So the planner is written under the assumption that your random_page_cost will be slightly higher than seq_page_cost, certainly not lower. But you might want to try a low cost like 1.1.

In conclusion

That said, just setting random_page_cost to "1.1" is not going to solve all issues.

Sometimes your bad plans will not be caused by this really, it's just masking the true problem, which might be a row mis-estimate, or other problems in Postgres, like the fact that you have a LIMIT clause and then Postgres thinks it only has to read a certain number of entries when actually it had to read a lot more unexpectedly.

So always good to dive into the details on this, actually look at how these costs functions work, so that you can actually understand why your query plans behave in a certain way.

I hope you learned something new from E105 of 5mins of Postgres. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn and X/Twitter to get updates about new episodes!