Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

5mins of Postgres E35: How to run EXPLAIN on normalized queries from pg_stat_statements using plan_cache_mode

In today’s episode we're going to talk about how to run EXPLAIN on a normalized query from pg_stat_statements and how to get the generic query plan.



Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and check out the newsletter archive and subscribe to our YouTube channel.


Transcript

Let's have a look!

Postgres EXPLAIN plans and pg_stat_statements

This blog post by Franck from Yugabyte talks about how you can get an EXPLAIN plan from a query that you found in pg_stat_statements.

What is pg_stat_statements in Postgres

If you're not familiar with pg_stat_statements, it gives you a view of all the queries that have run on your database. That is very useful to get an understanding of what your query workload looks like on a production system.

One important thing to know about pg_stat_statements is that it does normalization. In his example, Franck does a SELECT on a table and for example, SELECT FROM demo WHERE i=42 AND t='x'. The i=42 for example got turned into i=$1.

This normalization initially didn't exist in Postgres. Initially, pg_stat_statements just recorded the full query text, which led to all kinds of problems. The first thing that happened is that Peter Geoghegan improved this in Postgres 9.3 by adding a normalization logic. But, initially this was using question mark characters and you might not know this, but ? is also a valid operator in Postgres.

Now, I have two fun facts for you today!

Fun fact number one:

I actually contributed a patch to Postgres a couple of years ago that changed this. The idea was that instead of recording these question marks, we would record $1, $2, and such, the same way as prepared statements use them, so that the queries are usable and this ambiguity with the ? operator would be gone. Tom Lane committed this to Postgres about 5 years ago, based on that patch. You can now make sense of the queries cause they look pretty much like a regular prepared statement.

But, when you run EXPLAIN, you will see that it will error out with the error "There is no parameter $1". Postgres tries to interpret this query, it sees that you're referencing a parameter here and it doesn't know the value of that parameter and so it can't run the EXPLAIN for you.

What Franck does here: he actually prepares that statement. He does a PREPARE test and then puts in the query. When you create a prepared statement in Postgres, it looks at the query and it determines the parameter types for the individual parameters. Even though you haven't actually passed in a value, Postgres knows that this $1 parameter here is most likely going to be a bigint because the i column is a bigint, similarly the t column is a text and so it assumes that the $2 parameter is a text. That's actually very useful, we'll come back to that later.

When you create a prepared statement in Postgres, it looks at the query and it determines the parameter types for the individual parameters.

Can we use EXPLAIN by passing null?

Now, I don't know the value, maybe I can just EXPLAIN the statement by passing null. Unfortunately, null is not the same thing as "you don't know". Null does have a distinct meaning to the planner.

For example, the Postgres planner will return you a "Result" node, meaning it doesn't actually do any lookups on tables. You're passing in null as the primary key. That's impossible, there is no row that would fulfill such a requirement and so that's why it says "Well, I'm not even going to look at the table". Clearly, this is not the actual query plan you would see typically.

There are a couple of ways to work around this. First, Franck would try to run the statement five times. The reason for that is that in prepared statements in Postgres, if you run a statement a sixth time, under certain conditions, Postgres will actually switch to the generic plan, which is the plan that doesn't depend on the actual values. There are some subtleties on when that works and it really comes down to how Postgres costing works.

Using plan_cache_mode in Postgres to force the query planner to use a generic plan

In Postgres 12, there is actually a new option called plan_cache_mode which lets you force the query planner to always use a generic plan. If I pass null here, Postgres says, "I don't know the values", I'm going to force the generic plan.

This very useful. It helps you get a good understanding, based on this pg_stat_statements query: is there an index that covers that query?

Before Postgres 12, there are a couple of hacks and what Franck describes here is a hack using the CPU operator cost, but there is actually a better way!

And this is fun fact number two for today, which you might be surprised by:

I found this a couple of years ago in the postgres_fdw source code, and there is a way to trick the Postgres planner into an unknown constant value. You do this very obscure sub-select, which essentially takes a null, casts it to a certain type, and then for certain grammar reasons casts it to a type again.

We can actually see that if I run the query that Franck had in his example, I get this result filter. Now I have to get a little bit creative here, I went into the initial query and I replaced the $1, the $2, with those special sub-selects with null.

I actually get the generic query plan without having to force_generic_plan. Now, this is a new enough Postgres, so I could also do the other option here, and that also works. If you're in a newer Postgres version, I would recommend using the plan_cache_mode setting.

If you're in a newer Postgres version, I would recommend using the plan_cache_mode setting.

In conclusion, this is a way to get from pg_stat_statements to a generic EXPLAIN plan. This may not be the exact same EXPLAIN plan than what you would get if you pass in an actual value, and so on a production system, if you can, I always recommend enabling auto_explain or collecting actual query samples using log_min_duration_statement, because then you know the actual query plans and that's going to be much better quality than a generic EXPLAIN plan.

Thanks so much for listening. This was today’s episode of 5mins of Postgres. Subscribe to our YouTube channel and follow us on Twitter if you want to hear about next week's episode.

What we have discussed in this episode of 5mins of Postgres


Enjoy blog posts like this?

Get them once a month to your inbox