Postgres 16: Running EXPLAIN on any query (even with $1 parameters!)
In episode 65 of “5mins of Postgres” we're going to talk about explaining generic plans in Postgres 16. We're looking at generic plans vs. custom plans, the benefits of generic plans and the limitations of EXPLAIN (GENERIC_PLAN).
Let's have a look!
This is a blog post by Laurenz Albe on the Cybertec blog, where he talks about a feature that he contributed to Postgres 16.
First of all, what do we mean by generic plan?
Generic plans are a variant of a query plan. They are independent of the actual parameter values. For example, when you use a prepared statement with different parameter values, Postgres has certain logic that on the 6th execution, it will decide whether to rely on the generic plan vs. make custom plans for each set of parameters. This can provide performance benefits, but also downsides, depending on certain situations.
This can provide performance benefits, but also downsides, depending on certain situations.
The other interesting aspect of generic plans is that they can give you an EXPLAIN plan without requiring the parameter values.
There is a way to do this in earlier Postgres versions. Laurenz previously talked about this in a blog post, and we also had a 5mins of Postgres episode on this topic, explaining that you can use the
plan_cache_mode feature in Postgres to force a generic plan. This way, you prepare a statement and then you make sure the prepared statement always uses the generic plan. That way, you can
EXPLAIN statements that for example contain a $1 parameter without you actually setting a particular value of $1.
In Postgres 16, Tom Lane recently committed Laurenz's patch, which adds a
GENERIC_PLAN option for
EXPLAIN. The idea is to make it simpler, to not have to use this workaround with plan_cache_mode, but instead, officially offer this as an option as part of
EXPLAIN. There's also a small improvement here, enabling something that was not possible previously, which is that sometimes execution time partition pruning kicks in, and then there were potential errors in the previous logic. Now, with this official option, you can actually get a generic plan for your query.
Now, with this official option, you can actually get a generic plan for your query.
Let's take a closer look at what that means. In Postgres 16, what you will be able to do is to say:
- then you'll give the
- then you give your query.
For example, a
SELECT statement. This is similar to how a regular
EXPLAIN works. But the big difference is that if you have a statement like
relname = $1 we are actually able to get an EXPLAIN plan for this statement without needing to set the value of $1.
In previous versions, what would've happened if you just ran "EXPLAIN", for example, is that it would have actually told you that there is no parameter $1. Postgres would normally expect the query to have all these parameters set.
Now, why is this useful? This seems kind of obscure. In reality, what often happens when you're debugging a particular query is that you'll get the query from one of two locations.
- Either you'll get it from the Postgres logs or
- you'll get it from pg_stat_statements.
In the case of the Postgres logs, for example, let's say you have a statement timeout. There are settings in newer Postgres versions to also output the parameter values. But many times you don't actually have them, and so it becomes useful to be able to get a generic query plan, just from that statement.
pg_stat_statements always records what it calls normalized statements. And again, these contain $1, $2 and such replacement values, so you're not actually able to get the actual values from
pg_stat_statements, for that you would have to use another extension. Now, what this
EXPLAIN (GENERIC_PLAN) feature allows you to do is to just take the query text from
EXPLAIN (GENERIC_PLAN) in front of it, and then it gives you an idea of how the statement might perform.
As Laurenz notes, for full analysis you’ll definitely still want to do an
EXPLAIN (ANALYZE, BUFFERS), for example, but for that to be able to run, you would actually need to know the appropriate parameter values.
Also important to note: sometimes Postgres query plans will be different depending on which parameter values you have. It can be that if you have a very common value that's common in the table, you'll get a different index scan than when you have a very infrequently used value. This selectivity calculation that Postgres does, can greatly influence query plans.
Sometimes Postgres query plans will be different depending on which parameter values you have
There are a couple of limitations of this new feature that are important to note. First of all, you can't just use these parameters everywhere. It has to be in a place where you can currently use a constant value in Postgres. A string for example, or a number. You couldn't just say,
FROM $1 being the table name, because Postgres couldn't plan that. Similarly, you can't just replace a whole clause. You could say
WHERE $1, but that would essentially just be the same as saying
WHERE true or something. It's nonsensical to replace full parts of a query with that.
The other thing to note is that sometimes you might run into data type problems. For example, the
generate_series function actually has multiple definitions for different data types. If you just said generic
generate_series with $1, $2, $3, it wouldn't actually work because it would complain that there are multiple functions and Postgres wouldn't know which one you meant.
There is a workaround. Either, you could fall back to the old style of doing this with an actual
PREPARE, and then you can specify the data types as part of the
PREPARE. Or, what I personally would recommend: actually modify the query and add an explicit typecast.
If you had that typecast, for example in your application code, it would actually get recorded this way in
pg_stat_statements or in the Postgres log as well. That way you can make sure that generic plans can be explained consistently for your queries, which is very useful to get a good sense of how your workload is structured.
All in all, I think this is a very useful feature and I look forward to using it in Postgres 16.