Postgres Planner Quirks: The impact of ORDER BY + LIMIT on index usage

Today we're going to start a new series on 5mins of Postgres called "Postgres Planner Quirks". When we say "quirks", we mean odd behavior that might make sense to you if you're a Postgres hacker, but certainly is confusing when you're the end user, like a DBA, data platform, engineer, or application developer.

In today’s E113 of “5mins of Postgres” we discuss a commonly encountered Postgres planner quirk, which is how Postgres behaves when you have a LIMIT and an ORDER BY clause, and it picks the wrong index.



Share this episode: Click here to share this episode on LinkedIn or on X/Twitter. Feel free to sign up for our newsletter and subscribe to our YouTube channel.


Transcript

So, this is something that I've actually seen many times on production, and we'll walk through a few examples of how this can happen and what you can do to resolve issues like this.

A bad query plan

So here, just to show you what we mean, this is a question from many years ago on DBA StackExchange, keeps being asked in different variations over the years.

People are asking when they execute a query, like "SELECT * FROM items WHERE object_id = something LIMIT 1", this works as expected:

SELECT *
FROM items
WHERE object_id = 123
LIMIT 1;

But then once I add an "ORDER BY id", different column that is not the one we're filtering by, then suddenly its slow:

SELECT *
FROM items
WHERE object_id = 123
ORDER BY id DESC  -- I added the ORDER BY
LIMIT 1;

And the fast query in this case is correctly using the index on our filter clause:

                                                                          QUERY PLAN                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
   ->  Index Scan using items_object_id_operation_idx on items  (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
         Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms

In the slow case is surprisingly using the primary key index, and it has a very high "Rows Removed by Filter":

                                                                            QUERY PLAN                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
   ->  Index Scan Backward using items_pkey on items  (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
         Filter: (object_id = 123::bigint)
         Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms

And here in this example, you can see the difference between the fast version is 0.029 milliseconds, really, really fast, and 873000 milliseconds. This is longer than 10 minutes to finish that slow version.

A test case of a slow ORDER BY + LIMIT

Now let's take a look at a particular example that was posted to the "pgsql-performance" mailing list back in 2022. Andre asked here, why is my query behaving odd?

The nice thing is that Andre actually did provide us with a full working test case, which is always very helpful to have, which does surface this primary key incorrectly being used. And so I'll reproduce this locally just to show you how this looks like.

We're first going to create this "orders_test" table:

CREATE TABLE orders_test(
    order_id int not null,
    shipping_date date not null,
    PRIMARY KEY (order_id)
);

And then we're inserting 2 million records into it:

INSERT INTO orders_test SELECT generate_series(1, 2000000), '2018-01-01'::timestamp + random() * ('2022-05-01'::timestamp - '2018-01-01'::timestamp);

Then we're running ANALYZE:

ANALYZE orders_test;

ANALYZE will capture the column statistics, which are relevant for making planner decisions.

Now we're also going to create an index. This is the index that we want our query to use:

CREATE INDEX ON orders_test(shipping_date, order_id);

And we're going to run our query now, and you'll see that this is essentially the behavior we want, which is it's picking the index that we have, and it runs in less than 1 millisecond:

EXPLAIN ANALYZE SELECT
FROM orders_test
WHERE TRUE
AND shipping_date >= '2022-05-01'
AND shipping_date <= '2022-05-01'
ORDER BY order_id
LIMIT 50;
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4.46..4.46 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=1)
   ->  Sort  (cost=4.46..4.46 rows=1 width=4) (actual time=0.010..0.011 rows=0 loops=1)
         Sort Key: order_id
         Sort Method: quicksort  Memory: 25kB
         ->  Index Only Scan using orders_test_shipping_date_order_id_idx on orders_test  (cost=0.43..4.45 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=1)
               Index Cond: ((shipping_date >= '2022-05-01'::date) AND (shipping_date <= '2022-05-01'::date))
               Heap Fetches: 0
 Planning Time: 0.697 ms
 Execution Time: 0.035 ms
(9 rows)

Now, Andre kind of has an extreme case here that he showed, but I think this oftentimes does show up in slightly less obvious cases.

To illustrate the issue encountered, he intentionally caused skewed data, where now there is a 100,000 records with just one particular timestamp and then he ANALYZEs it again:

INSERT INTO orders_test SELECT generate_series(2000001, 2100000), '2022-05-01';
ANALYZE orders_test;

Now there is a second thing that's relevant in this particular example, which is the fact that we are looking for a range.

If you have a cluster of values that behave different than others, and you are doing a range comparison, then this is one of the cases where this will happen to you quite badly.

Surprising use of a primary key index

Same query that we're running previously finished in 0.09 milliseconds, now it takes 155 milliseconds:

                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..35.83 rows=50 width=4) (actual time=155.016..155.024 rows=50 loops=1)
   ->  Index Scan using orders_test_pkey on orders_test  (cost=0.43..74336.43 rows=104997 width=4) (actual time=155.014..155.018 rows=50 loops=1)
         Filter: ((shipping_date >= '2022-05-01'::date) AND (shipping_date <= '2022-05-01'::date))
         Rows Removed by Filter: 2000000
 Planning Time: 0.278 ms
 Execution Time: 155.048 ms
(6 rows)

Now, of course, both the physical structure of the table, as well as the statistics have changed. So one could argue this is actually the right index, right?

Because now what Postgres is saying is I will actually use the primary key index to fulfill the order by limit. And the assumption is if I look through the index then I will be able to find rows quickly.

Mis-estimates due to wrong correlation

Essentially the summary here is that the Postgres planner will estimate that it will hit 50 rows that satisfy the filter condition before it's gone very far in the primary key index. Now, in this example, what we're seeing is that the shipping date column you're filtering by and the primary key correlate in the wrong direction, and so Postgres makes a very over-optimistic guess.

And back in 2022, Tom Lane noted that he doesn't think Postgres has adequate stats yet to detect this sort of problem.

Debugging plans with pg_hint_plan

And so just to illustrate to you, if I debug a problem like this, what I like to do is I like to use pg_hint_plan. Not to fix it permanently, but just to understand what Postgres is mis-estimating.

And so we'll take a look at the table here, and we're going to essentially write the query hint on our EXPLAIN ANALYZE. And we're just going to force it to use the index that we think it should be using:

/*+ IndexScan(orders_test orders_test_shipping_date_order_id_idx) */ EXPLAIN ANALYZE
SELECT
FROM orders_test
WHERE TRUE
AND shipping_date >= '2022-05-01'
AND shipping_date <= '2022-05-01'
ORDER BY order_id
LIMIT 50;

And so here now you can see that if it were to use the index that we want it to use, it would only take 44 milliseconds, instead of the earlier 155 milliseconds:

                                                                                  QUERY PLAN                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=43472.96..43473.08 rows=50 width=4) (actual time=44.685..44.697 rows=50 loops=1)
   ->  Sort  (cost=43472.96..43735.45 rows=104997 width=4) (actual time=44.683..44.688 rows=50 loops=1)
         Sort Key: order_id
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Scan using orders_test_shipping_date_order_id_idx on orders_test  (cost=0.43..39985.03 rows=104997 width=4) (actual time=0.019..29.521 rows=100000 loops=1)
               Index Cond: ((shipping_date >= '2022-05-01'::date) AND (shipping_date <= '2022-05-01'::date))
 Planning Time: 0.224 ms
 Execution Time: 44.733 ms
(8 rows)

Now why is Postgres using the wrong index here? Really it comes down to cost. So you can see in the plan that we see here, we've got a cost of 35 on the LIMIT, versus on the good plan that we want it to use, we get a cost of 43,000.

Postgres is really optimistic and thinks that essentially, if it just looks for 50 values, it will quickly find matching rows, and thus will be able to terminate early.

A possible solution: Increasing default_statistics_target

There were a few other things that people have written about this over the years. For example, also back in 2022, Lawrence Jones, back then on the GoCardless team wrote about how they've run into this problem with the Postgres query planner. And so what they determined was, it came down to the "ndistinct" measurement in the "pg_stats" data that the Postgres planner maintains.

And so what they were able to do back then, and I think this is one way to resolve this issue is that you increase the statistics target for the column:

ALTER TABLE payment_transitions
ALTER COLUMN payout_id SET STATISTICS 1666;

Or alternatively raise the server's default_statistics_target, though I would personally stay clear of anything above 1,000 as a server-wide setting, due to the extra CPU cost and planning time overhad.

And what increasing the statistics target will do is it will sample a larger portion of the table. And what that will do is it will help you get the right estimate and as such discourage the bad plan.

Another thing I'll mention here, if you do not have a particular need for an ORDER BY condition like this, it can actually make sense not to have that condition. Oftentimes ORMs just generate that by default, like old Ruby on Rails versions for example, love to add that. And if you don't actually need that guarantee that it's ordered by the ID, then just removing the ORDER BY also resolves this problem.

A workaround: ORDER BY + 0

I want to mention one more thing, Cassidy posted "one weird trick for speeding up ORDER BY that you probably shouldn't use". And so the idea here is we're adding a "+0" to the ORDER BY.

Let's see if we can do this with the example we had earlier:

EXPLAIN ANALYZE SELECT
FROM orders_test
WHERE TRUE
AND shipping_date >= '2022-05-01'
AND shipping_date <= '2022-05-01'
ORDER BY order_id+0
LIMIT 50;
                                                                                    QUERY PLAN                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8761.79..8761.92 rows=50 width=4) (actual time=39.892..39.902 rows=50 loops=1)
   ->  Sort  (cost=8761.79..9024.29 rows=104997 width=4) (actual time=39.890..39.894 rows=50 loops=1)
         Sort Key: ((order_id + 0))
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Index Only Scan using orders_test_shipping_date_order_id_idx on orders_test  (cost=0.43..5273.87 rows=104997 width=4) (actual time=0.021..30.938 rows=100000 loops=1)
               Index Cond: ((shipping_date >= '2022-05-01'::date) AND (shipping_date <= '2022-05-01'::date))
               Heap Fetches: 100000
 Planning Time: 0.608 ms
 Execution Time: 39.940 ms
(9 rows)

And so now we can see, that in this case, it is picking the right index. And really what Postgres is doing here is because you have this expression now in the ORDER BY, it no longer thinks that the index fulfills the expression. Despite, you know, the plus zero not really taking any effect.

And so this now lets you just fix the query by essentially adding this nonsensical +0 into the query. Thank you Cassidy for sharing this tip, I would probably prefer this over pg_hint_plan, if I couldn't figure out why the statistics were bad.

The community, by the way, is aware of people using this trick to fix things, so despite there being a lot of pushback in the community around planner hints, I would expect this to be fairly stable across releases, just because people know that this is one of the ways that people resolve this quirk in the Postgres query planner.

I hope you learned something new from E113 of 5mins of Postgres. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn and X/Twitter to get updates about new episodes!

What we have discussed in this episode of 5mins of Postgres


Enjoy blog posts like this?

Get them once a month to your inbox