Postgres Planner Quirks: JOIN Equivalence Classes and IN/ANY filters

By Lukas Fittl

June 02, 2024

In today’s E117 of “5mins of Postgres” we continue our series on Postgres planner quirks. Today, we discuss JOIN column equivalence and when there are issues with IN/ANY filters not being considered as part of the equivalence class in Postgres.

Share this episode: Click here to share this episode on LinkedIn. Feel free to sign up for our newsletter and subscribe to our YouTube channel.

The start of the puzzle: An inefficient Nested Loop Join
- The role of parameterized index scans
- The fix: Adding an additional WHERE condition
The role of Equivalence Classes in the Postgres Planner
Another example to illustrate the performance impact
What we have discussed in this episode of 5mins of Postgres

Transcript

We'll start with this blog post by Deepak Mahto on his personal blog. In this post, Deepak describes a situation that he recently encountered on the Postgres Community Slack channel, where there was an interesting performance issue involving a SQL query that joins two tables with an ANY filter applied to one of the tables. I also happened to have been in that same conversation.

The start of the puzzle: An inefficient Nested Loop Join

In the Slack conversation, somebody had brought a query or rather a subset of a query where they were seeing quite a slow down in this particular JOIN:

Hi I have a query that joins two tables using an inner join. here's an example of the query and query plan it generates.

SELECT
    papa.six
FROM
    tango_zulu papa
    INNER JOIN hotel alpha ON papa.six = alpha.quebec
WHERE
    alpha.quebec = ANY (:documents)

Nested Loop  (cost=1.14..30886.05 rows=1313 width=8) (actual time=0.833..289.092 rows=1965 loops=1)
  Buffers: shared hit=10487 read=758 dirtied=34
  I/O Timings: read=280.949
  ->  Index Only Scan using yankee on hotel alpha  (cost=0.57..4587.50 rows=1000 width=8) (actual time=0.101..26.818 rows=1000 loops=1)
          Index Cond: (alpha.quebec = ANY ('tango_foxtrot'::integer[]))
          Heap Fetches: 428
        Buffers: shared hit=5078 read=245
        I/O Timings: read=23.646
  ->  Index Only Scan using seven on tango_zulu papa  (cost=0.57..26.11 rows=19 width=8) (actual time=0.249..0.262 rows=2 loops=1000)
          Index Cond: (papa.six = alpha.quebec)
          Heap Fetches: 1679
        Buffers: shared hit=5409 read=513 dirtied=34
        I/O Timings: read=257.303
Planning:
  Buffers: shared hit=23
Execution time: 289.255 ms

They were essentially concerned that it takes up to 4 seconds to do this part of the query plan, and they were seeing a Nested Loop with between 3000 to 9,000 loops on the inner side of the Nested Loop.

The question here was, is there a way to make it pick a more efficient type of JOIN?

When turning Nested Loop Join off (SET enable_nestloop = off), they get a query that doesn't finish executing, because it results in a sequential scan.

The role of parameterized index scans

Now my initial realization, when I looked at the EXPLAIN plan was that I thought they were seeing an index condition that was requiring the use of a parameterized index scan: Index Cond: (papa.six = alpha.quebec).

You can see here they have two anonymized table names, "papa" and "alpha", and you can see in the index condition that it was requiring the value on the first table to be able to query the second table. So when it was querying the "papa" table, it was essentially substituting individual result values from the "alpha" table.

I've written about Parameterized Index Scans before - they are essentially only possible when you have nested loops. This would explain why Postgres kept choosing Nested Loop.

However I had missed something in the first query, which is the fact that the alpha.quebec column they're querying on here is also the equality column in the JOIN condition:

JOIN alpha ON (papa.six = alpha.quebec)
WHERE alpha.quebec = ANY (:documents)

Technically Postgres should be able to know that you're looking for this particular value of document IDs in both the papa and the alpha table. It's the exact same set of IDs.

In a sense, Postgres should be smart enough to apply that to both. Now it turns out Postgres isn't smart enough. And so today I want to explain to you a little bit why that is the case.

The fix: Adding an additional WHERE condition

But first off, the way that Deepak suggested fixing this was to introduce an additional WHERE condition that explicitly queries for the same value in both of these tables, despite there being a JOIN condition:

SELECT
    papa.six
FROM
   tango_zulu papa
   INNER JOIN hotel alpha ON papa.six = alpha.quebec
WHERE
   alpha.quebec = ANY (:documents)
   and papa.six = ANY (:documents)

This actually ended up working, and Postgres successfully picked the Merge Join in this case, as confirmed by the original requestor:

Deepak's suggestion actually worked! By default it picked merge_join. I didn't know that something like a parameterized index scan existed. Thanks for pointing it out. I've been breaking my head on this query for a few days now

Now, Deepak did do a little bit more testing here and he was testing this in Postgres 16.3, but he also did a test later on 17 beta1, and he reproduced the same behavior, so this is something that you will still encounter in Postgres today.

The role of Equivalence Classes in the Postgres Planner

Equality comparisons are correctly applied to all involved tables

In Deepak's follow-up blog post he dug a bit deeper into this, and also describes that if you have an equality comparison, Postgres is actually able to apply the equality comparison to both of the involved tables:

explain (analyze, buffers)
SELECT
    tbl1.col1
FROM
    tbl1
    INNER JOIN tbl2 ON tbl1.col1 = tbl2.col1
WHERE
    tbl2.col1 = 1;

QUERY PLAN
----------------------------------------------
 Nested Loop  (cost=0.00..3.38 rows=1 width=4) (actual time=0.144..0.168 rows=1 loops=1)
   Buffers: shared hit=2
   ->  Seq Scan on tbl1  (cost=0.00..2.25 rows=1 width=4) (actual time=0.115..0.134 rows=1 loops=1)
         Filter: (col1 = 1)
         Rows Removed by Filter: 99
         Buffers: shared hit=1
   ->  Seq Scan on tbl2  (cost=0.00..1.12 rows=1 width=4) (actual time=0.024..0.028 rows=1 loops=1)
         Filter: (col1 = 1)
         Rows Removed by Filter: 9
         Buffers: shared hit=1
 Planning Time: 1.343 ms
 Execution Time: 0.282 ms
(12 rows)

Now this puzzled me a little bit, because I was like, if Postgres can do this for an equality comparison, like in this case "col1 = 1", why can't it do the same for an "= ANY"?

What are Equivalence Classes?

And this has to do with what Postgres calls, equivalence classes.

An equivalence class is something that is determined as part of the Postgres query planning process, there's a detailed description of this in the optimizer README file:

During the deconstruct_jointree() scan of the query's qual clauses, we look for mergejoinable equality clauses A = B. When we find one, we create an EquivalenceClass containing the expressions A and B to record that they are equal. If we later find another equivalence clause B = C, we add C to the existing EquivalenceClass for {A B}; this may require merging two existing EquivalenceClasses. At the end of the scan, we have sets of values that are known all transitively equal to each other.
...
An EquivalenceClass also contains a list of btree opfamily OIDs, which determines what the equalities it represents actually "mean". All the equivalence clauses that contribute to an EquivalenceClass must have equality operators that belong to the same set of opfamilies.

But I'll be honest, that README is a bit hard to follow.

In a related mailing list thread from 2021 ("Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?"), Tom Lane also described them like this, in simpler terms:

The planner uses equivalence classes to deduce implied conditions. That is, we have the join condition a.adate = b.bdate and then you've added the where condition a.adate = '2021-05-12'. Transitivity implies that b.bdate = '2021-05-12', so we deduce that condition and are able to apply it at the relation scan of b.

Furthermore, having restricted both a.adate and b.bdate to the same constant value at the scan level, we no longer need to apply the join condition a.adate = b.bdate at all. This is important not only to avoid the (probably minor) inefficiency of rechecking the join condition, but because if we believed that all three conditions were independently applicable, we'd come out with a serious underestimate of the size of the join result.

For context, that mailinglist thread started where somebody reported a similar behavior like Deepak described with IN lists, but they reported this with an ">=" condition.

For the sake of this discussion you can consider an ">=" the same as an "ANY" or an IN list, which is they're not simply equality comparisons, and so Postgres does not today apply the equivalence class optimization to it.

Could IN/ANY be considered like equality comparisons?

However, I was more interested in what is being done about this? Has the community talked about potential improvements here?

In that same thread, in the context of ">=", Tom continued:

None of the argument sketched above works for non-equality conditions. There are some situations where you could probably figure out how to use transitivity to deduce some implied condition, but cleaning things up so that you don't have redundant conditions fouling up the join size estimates seems like a hard problem.

Another issue is that we could easily expend a lot of cycles on deductions that lead nowhere, because once you try to open up the mechanism to consider operators other than equality, there will be a lot of things that it looks at and then fails to do anything with. The equivalence class mechanism is tied into the same logic that considers merge and hash joins, so we are expending lots of cycles anytime we see an equality operator, and not so much for other operators.

In my understanding, I think a lot of the worry here is that if you were to include non-equivalency conditions, including IN lists, which are effectively list of ORs, in this consideration, it would mess with the assumption that equivalence classes have today, that it can actually omit that JOIN condition, because it is applied independently to both of the scans.

So really the conclusion of this discussion is, there isn't an easy fix for this in the planner.

In a different conversation, back in 2017, Tom did note that IN lists are a bit of a simpler case that might be acceptable to treat as part of equivalence classes after all:

I'd think that being able to deduce "b IN somelist" from "a = b and a IN somelist" is more valuable, simply because the IN would typically be a much stronger constraint than an inequality. So that idea suggests that it's more worth expending planner cycles to chase the possibility.

I do vaguely recall discussions specifically around IN, though I didn't have any luck finding one in the archives. There's also the recent thread which suggests being able to simplify "a IN somelist AND a IN someotherlist". If we wanted to do that, making the "lists" be treatable as eclass members would be a good place to start, because that would naturally result in intersect-able lists ending up in the same eclass.

However to my knowledge this work has so far not been approached in that way.

There were a couple of folks working on these kinds of problems over the years, David Rowley, Andy Fan, but unfortunately the latest that I could find this discussion was in late 2022.

Another example to illustrate the performance impact

I want to show you one more example from that mailinglist thread, which shows the impact that this problem can have.

So, this is an example that Tomas Vondra found based on a report on the "pgsql-general" mailing list where somebody was encountering a performance issue very much similar to what we're seeing discussed in this thread, and what was reported on in that Slack conversation.

create table t1 (a int);
create table t2 (a int);

insert into t1
select i from generate_series(1,100000) s(i);

insert into t2
select mod(i,100000) from generate_series(1,10000000) s(i);

create index on t1(a);
create index on t2(a);

vacuum analyze t1, t2;

-- we need to force mergejoin
set enable_nestloop = off;

I'm re-running this locally just to show you how this works. The query provided here uses an a ">=" comparison:

EXPLAIN ANALYZE
  SELECT t1.a, t2.a
    FROM t1
    JOIN t2 USING (a)
   WHERE (t1.a > 99000)
   ORDER BY t1.a
   LIMIT 100;

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.65..245.13 rows=100 width=8) (actual time=553.563..553.574 rows=100 loops=1)
   ->  Merge Join  (cost=8.65..209379.65 rows=88537 width=8) (actual time=553.562..553.569 rows=100 loops=1)
         Merge Cond: (t1.a = t2.a)
         ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..28.01 rows=898 width=4) (actual time=0.084..0.085 rows=1 loops=1)
               Index Cond: (a > 99000)
               Heap Fetches: 0
         ->  Index Only Scan using t2_a_idx on t2  (cost=0.43..183464.09 rows=9999977 width=4) (actual time=0.014..333.909 rows=9900200 loops=1)
               Heap Fetches: 0
 Planning Time: 0.562 ms
 Execution Time: 553.618 ms
(10 rows)

We can see in this case that the query runs for about 553 milliseconds.

You can see that the index condition only shows up on "t1", not on "t2", right? That's essentially our problem case here, despite us having this USING clause, it doesn't actually mean that the condition gets automatically applied to both tables.

And so if we were to essentially rewrite this query and we were to change this to include an explicit condition for "t2", then the query goes from 550 milliseconds to less than a millisecond:

EXPLAIN ANALYZE
  SELECT t1.a, t2.a
    FROM t1
    JOIN t2 USING (a)
   WHERE (t1.a > 99000) AND (t2.a > 99000) --- This was modified
   ORDER BY t1.a
   LIMIT 100;

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.82..268.90 rows=100 width=8) (actual time=0.064..0.122 rows=100 loops=1)
   ->  Merge Join  (cost=0.82..2515.37 rows=938 width=8) (actual time=0.062..0.108 rows=100 loops=1)
         Merge Cond: (t1.a = t2.a)
         ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..28.01 rows=898 width=4) (actual time=0.026..0.027 rows=1 loops=1)
               Index Cond: (a > 99000)
               Heap Fetches: 0
         ->  Index Only Scan using t2_a_idx on t2  (cost=0.43..2210.82 rows=105965 width=4) (actual time=0.029..0.049 rows=100 loops=1)
               Index Cond: (a > 99000)
               Heap Fetches: 0
 Planning Time: 0.741 ms
 Execution Time: 0.184 ms
(11 rows)

So tremendous speed up here, and of course this is going to be situational, but in this case, just the fact that the index condition is now on both tables before the JOIN, makes this a lot faster. Note that the merge condition still exists: Merge Cond: (t1.a = t2.a).

Just to compare this, if we did an equality comparison:

EXPLAIN ANALYZE
  SELECT t1.a, t2.a
    FROM t1
    JOIN t2 USING (a)
   WHERE (t1.a = 99000)
   ORDER BY t1.a
   LIMIT 100;

                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10000000000.73..10000000011.47 rows=99 width=8) (actual time=0.087..0.114 rows=100 loops=1)
   ->  Nested Loop  (cost=10000000000.73..10000000011.47 rows=99 width=8) (actual time=0.085..0.106 rows=100 loops=1)
         ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..4.31 rows=1 width=4) (actual time=0.038..0.039 rows=1 loops=1)
               Index Cond: (a = 99000)
               Heap Fetches: 0
         ->  Index Only Scan using t2_a_idx on t2  (cost=0.43..6.17 rows=99 width=4) (actual time=0.043..0.055 rows=100 loops=1)
               Index Cond: (a = 99000)
               Heap Fetches: 0
 Planning Time: 0.264 ms
 Execution Time: 0.173 ms
(10 rows)

Ignoring the broader shape of the plan (which is different since we're querying much less data), you can see that the Nested Loop does not have any JOIN condition because of the equivalence class mechanism, as described earlier.

Now, if we were to change this to an IN list (again looking at a different set of data), we can see the same behavior, it takes about 200 milliseconds:

EXPLAIN ANALYZE
  SELECT t1.a, t2.a
    FROM t1
    JOIN t2 USING (a)
   WHERE (t1.a IN (99000, 99001))
   ORDER BY t1.a
   LIMIT 100;

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1005.89..69825.31 rows=100 width=8) (actual time=201.856..201.918 rows=100 loops=1)
   ->  Gather Merge  (cost=1005.89..136580.15 rows=197 width=8) (actual time=201.854..201.912 rows=100 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Merge Join  (cost=5.87..135557.38 rows=82 width=8) (actual time=195.469..195.474 rows=33 loops=3)
               Merge Cond: (t2.a = t1.a)
               ->  Parallel Index Only Scan using t2_a_idx on t2  (cost=0.43..125130.89 rows=4166657 width=4) (actual time=0.059..114.826 rows=3300034 loops=3)
                     Heap Fetches: 0
               ->  Materialize  (cost=0.29..8.63 rows=2 width=4) (actual time=0.068..0.069 rows=35 loops=3)
                     ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..8.62 rows=2 width=4) (actual time=0.062..0.063 rows=2 loops=3)
                           Index Cond: (a = ANY ('{99000,99001}'::integer[]))
                           Heap Fetches: 0
 Planning Time: 0.314 ms
 Execution Time: 201.961 ms
(14 rows)

And then if I add the condition on the other table, then suddenly it becomes 0.1 milliseconds:

EXPLAIN ANALYZE
  SELECT t1.a, t2.a
    FROM t1
    JOIN t2 USING (a)
   WHERE (t1.a IN (99000, 99001)) AND (t2.a IN (99000, 99001)) -- this was modified
   ORDER BY t1.a
   LIMIT 100;

                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.73..21.45 rows=1 width=8) (actual time=0.047..0.084 rows=100 loops=1)
   ->  Merge Join  (cost=0.73..21.45 rows=1 width=8) (actual time=0.045..0.075 rows=100 loops=1)
         Merge Cond: (t1.a = t2.a)
         ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..8.62 rows=2 width=4) (actual time=0.022..0.023 rows=1 loops=1)
               Index Cond: (a = ANY ('{99000,99001}'::integer[]))
               Heap Fetches: 0
         ->  Index Only Scan using t2_a_idx on t2  (cost=0.43..12.32 rows=197 width=4) (actual time=0.015..0.027 rows=100 loops=1)
               Index Cond: (a = ANY ('{99000,99001}'::integer[]))
               Heap Fetches: 0
 Planning Time: 0.473 ms
 Execution Time: 0.128 ms
(11 rows)

In this situation, we can see the query got 2000 times faster by adding that WHERE extra condition.

I hope you learned something new from E117 of 5mins of Postgres. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn and X/Twitter to get updates about new episodes!