Waiting for Postgres 17: Improved EXPLAIN for SubPlan nodes
In today’s E108 of “5mins of Postgres” we discuss a recently committed change to the Postgres 17 development branch that improves how EXPLAIN represents SubPlan
and InitPlan
nodes. We compare the output with 16, and discuss the background of how a subplan can result from a sub-SELECT, and when it results in a regular JOIN instead.
Share this episode: Click here to share this episode on LinkedIn or on X/Twitter. Feel free to sign up for our newsletter and subscribe to our YouTube channel.
Transcript
This is a change that Tom Lane committed about a week ago, co-authored with Dean Rasheed, and initially reported by Chantal Keller. And in this change, Tom and Dean are refining how EXPLAIN plans are showing what's called a SubPlan.
Improving how SubPlan nodes get represented in EXPLAIN
Now, the important distinction here is let's say you have a sub-SELECT, Postgres may choose to implement your sub-SELECT as a JOIN, or it may choose to implement your sub-SELECT as a subplan. We'll talk about a few examples of those in a moment.
So really this is mainly a display change. This particular change here does not change when subplans are issued, but personally I've definitely run into this being quite confusing, and so I think this is a great improvement.
Now let's take a look at these changes. So in the Postgres documentation, there's now a new section on the Using EXPLAIN page, which talks about query plans that involve subplans.
What are subplans?
As noted a subplan will arise from a sub-SELECT in the original query. Some queries end up just being ordinary JOIN queries. But then sometimes queries don't have that. And so what you will see, and you also saw this in previous Postgres versions, is what's for example, here showing as "SubPlan 1". Which means that, for example, in this case, it's doing a sequential scan on this table and then for each of the rows, it's issuing that sub plan to get the additional data that we're, in a sense joining against, but it's not really a JOIN, because it can't be represented in the regular JOIN dynamic.
What are the "tenk" and "onek" tables in the Postgres docs?
Now, before we jump into these examples, it's maybe worth calling out what these tables here are. So oftentimes when you read the Postgres documentation, you will see this "tenk" or "onek" table. And these are example tables that are used in the Postgres regression tests, that are also sometimes used in the Postgres documentation.
And so if you're interested, how these tables look like. You can look at the regression test source of Postgres and you can see that the "tenk" table, for example, has quite a few different columns, mostly integer, with a few text columns here. Now as the name says, "ten k", this is a table of 10 K rows.
Now, similarly, there's a "one k" dataset, which is 1,000 lines, that's stored in the "onek" table. And so if you see references to these datasets in the Postgres documentation, you can just understand that "onek" is 1,000 rows, "tenk" that's 10,000 rows and sometimes you'll also see "onek2" or "tenk2", this is the same data, just copied to a separate table to represent more complex JOIN behaviors.
Now with that in mind, let's take a look at a few examples and compare them between 16 and 17.
Comparing Postgres 16 and 17 EXPLAIN for subplans
A simple SubPlan node
First of all, this is an example, where the subplan gets executed once for each of the rows in the parent plan node. And so if I execute this on 16, which I have running here, I will see a plan like this:
postgres16=# EXPLAIN VERBOSE SELECT unique1 FROM tenk1 t
WHERE t.ten < ALL (SELECT o.ten FROM onek o WHERE o.four = t.four);
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on public.tenk1 t (cost=0.00..482.50 rows=5000 width=4)
Output: t.unique1
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on public.onek o (cost=0.00..0.00 rows=1 width=4)
Output: o.ten
Filter: (o.four = t.four)
(7 rows)
If executed it on 17, plan looks slightly different:
postgres17=# EXPLAIN VERBOSE SELECT unique1 FROM tenk1 t
WHERE t.ten < ALL (SELECT o.ten FROM onek o WHERE o.four = t.four);
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on public.tenk1 t (cost=0.00..241095.00 rows=5000 width=4)
Output: t.unique1
Filter: (ALL (t.ten < (SubPlan 1).col1))
SubPlan 1
-> Seq Scan on public.onek o (cost=0.00..47.50 rows=250 width=4)
Output: o.ten
Filter: (o.four = t.four)
(7 rows)
Now the big difference here is if you look at that Filter statement. Previously on 16, you would just see that the filter said "(SubPlan 1)", but it didn't really tell you what it was filtering on.
And so even though our query actually had, the comparison here with the "<" operator and the ALL, that wasn't really clear from the EXPLAIN plan.
And so in 17, what's really nice here is that we can see the actual expression. This also makes multiple columns easier to understand, and in this case, one of the columns is essentially being pulled out and that's what's used in this comparison. So when you have more complex cases, this is going to make it much easier to understand.
Now, just to illustrate two more things. So, first of all, in this case, we can see that this is actually executing it once per row. And so if I add an ANALYZE here:
postgres17=# EXPLAIN (ANALYZE,VERBOSE) SELECT unique1 FROM tenk1 t
WHERE t.ten < ALL (SELECT o.ten FROM onek o WHERE o.four = t.four);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on public.tenk1 t (cost=0.00..241095.00 rows=5000 width=4) (actual time=24.747..24.748 rows=0 loops=1)
Output: t.unique1
Filter: (ALL (t.ten < (SubPlan 1).col1))
Rows Removed by Filter: 10000
SubPlan 1
-> Seq Scan on public.onek o (cost=0.00..47.50 rows=250 width=4) (actual time=0.001..0.001 rows=2 loops=10000)
Output: o.ten
Filter: (o.four = t.four)
Rows Removed by Filter: 8
Planning Time: 0.193 ms
Execution Time: 24.805 ms
(11 rows)
We can see is that this sub plan here actually does get executed 10,000 times, because for each of the rows in the 10k table, I'm executing the subplan. And so that's something that was there before in 16 as well, but it's important to remember that.
The other thing I can illustrate in this plan, is that if I change the ALL to an ANY, you actually see that now this turns into a JOIN:
postgres17=# EXPLAIN VERBOSE SELECT unique1 FROM tenk1 t
WHERE t.ten < ANY (SELECT o.ten FROM onek o WHERE o.four = t.four);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=57.50..578.33 rows=3333 width=4)
Output: t.unique1
Hash Cond: (t.four = o.four)
Join Filter: (t.ten < o.ten)
-> Seq Scan on public.tenk1 t (cost=0.00..445.00 rows=10000 width=12)
Output: t.unique1, t.unique2, t.two, t.four, t.ten, t.twenty, t.hundred, t.thousand, t.twothousand, t.fivethous, t.tenthous, t.odd, t.even, t.stringu1, t.stringu2, t.string4
-> Hash (cost=45.00..45.00 rows=1000 width=8)
Output: o.four, o.ten
-> Seq Scan on public.onek o (cost=0.00..45.00 rows=1000 width=8)
Output: o.four, o.ten
(10 rows)
Postgres has certain logic where it'll do what's called sub query pull up. And so instead of the sub-SELECT being executed as a sub plan, it actually now becomes part of the top plan.
Generally speaking that will give you better plans, so Postgres tries to do that whenever possible, but sometimes that's not possible, and then you will see a sub plan in your query.
Hashed SubPlan nodes
Now the next example illustrates what's called a hashed subplan. This is something that I personally always found very confusing when it happens. And so on 16 it looks quite confusing, which is literally just says, okay, the filter is when NOT (hashed SubPlan 1)
, and it doesn't really tell you much:
postgres16=# EXPLAIN SELECT * FROM tenk1 t WHERE t.unique1 NOT IN (SELECT o.unique1 FROM onek o);
QUERY PLAN
--------------------------------------------------------------
Seq Scan on tenk1 t (cost=0.00..470.00 rows=5000 width=244)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on onek o (cost=0.00..0.00 rows=1 width=4)
(4 rows)
Now in 17, this is much clearer to understand:
postgres17=# EXPLAIN SELECT * FROM tenk1 t WHERE t.unique1 NOT IN (SELECT o.unique1 FROM onek o);
QUERY PLAN
------------------------------------------------------------------
Seq Scan on tenk1 t (cost=47.50..517.50 rows=5000 width=244)
Filter: (NOT (ANY (unique1 = (hashed SubPlan 1).col1)))
SubPlan 1
-> Seq Scan on onek o (cost=0.00..45.00 rows=1000 width=4)
(4 rows)
Because now it's saying: Well, there is an ANY clause here, Postgres turns this NOT IN into a NOT ANY, and then it compares "unique1", which comes from the outer "onek" table to the first column of the sub plan, which in this case is also "unique1", cause that's the first column that's being returned here. And so this is now much clearer, right? If you look at 16 very confusing, 17 much clearer to understand.
InitPlan nodes
Now, the last example here is of what's called an InitPlan. And if you see "InitPlan", what that means is that Postgres will execute that subplan once at the start of the query, and then will be able to reference the result multiple times potentially, but it doesn't have to rerun that subplan.
And so in this example here, we're just selecting a random number, and so that doesn't change depending on your input. Now in Postgres 16, I get a little bit verbose output here:
postgres16=# EXPLAIN VERBOSE SELECT unique1
FROM tenk1 t1 WHERE t1.ten = (SELECT (random() * 10)::integer);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on public.tenk1 t1 (cost=0.02..470.02 rows=1000 width=4)
Output: t1.unique1
Filter: (t1.ten = $0)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.02 rows=1 width=4)
Output: ((random() * '10'::double precision))::integer
(6 rows)
So it says this "Init Plan 1" returns "$0" and then here, it essentially compares with "$0". Now, in Postgres 17, this is slightly refined:
postgres17=# EXPLAIN VERBOSE SELECT unique1
FROM tenk1 t1 WHERE t1.ten = (SELECT (random() * 10)::integer);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on public.tenk1 t1 (cost=0.02..470.02 rows=1000 width=4)
Output: t1.unique1
Filter: (t1.ten = (InitPlan 1).col1)
InitPlan 1
-> Result (cost=0.00..0.02 rows=1 width=4)
Output: ((random() * '10'::double precision))::integer
(6 rows)
And so we're losing this "$0" definition here, which can be a bit confusing, and instead it's just referencing the InitPlan, and it's saying here is this "col1".
In conclusion
All in all this should make it easier to understand how you subplans and InitPlans behave in your EXPLAIN plans. Again, this doesn't change the behavior, but it just makes it easier to understand what Postgres is doing.
Thank you to the community and to Tom Lane for committing this patch. I think this is a great refinement in how EXPLAIN plans get represented.
I hope you learned something new from E108 of 5mins of Postgres. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn and X/Twitter to get updates about new episodes!
What we have discussed in this episode of 5mins of Postgres
- Improve EXPLAIN's display of SubPlan nodes and output parameters - Postgres commit by Tom Lane
- Using EXPLAIN - Postgres documentation for Postgres 17 development branch
- Regression test setup - Postgres source code
- Regression test "onek" table data - Postgres source code
- Regression test "tenk" table data - Postgres source code