Using Memoize to speed up joins between two Postgres tables
This is episode 51 of 5mins of Postgres, and today we're going to talk about how to use Memoize to speed up joins between two tables. We’re looking at presentations from Bruce Momjian and Lukas Eder and quickly explain a bug with Memoize in Postgres versions earlier than 14.4.
Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and subscribe to our YouTube channel.
Transcript
Let's get started!
What is the Memoize plan node in Postgres?
This episode is inspired by a new presentation by Bruce Momjian that he gave last week at the Postgres Washington DC user group. Bruce has been giving a presentation called "Explaining the Postgres Query Optimizer" for the last 10 years, which is a pretty good overview of all the different index and join types. In this new presentation, "Beyond Joins and Indexes", Bruce shows you 42 different plan nodes, that are not just JOINs or indexes that the Postgres planner may choose to use, depending on your query. Download the presentation PDF here if you’d like to follow along!
The one plan node that we're going to talk about today is Memoize. The Memoize plan node gets chosen in a very particular case. Which is: you have two tables that are being joined using a Nested Loop Join, and the inner plan node uses a parameterized index scan. It has to use the data from the original load to find the rows in the second table. The example Bruce shows us uses a table that has duplicates, which is the case where historically Nested Loop Joins were doing some extra work, but now, with Memoize in the picture, it improves the performance!
In the example, you can see that the query plan inserts a Memoize node on top of the Index Only Scan. What happens is that this essentially is a local cache that sits between the Index Scan reading the data and trying to access information from the other plan node.
Effectively, what Memoize does is that it speeds up the loading of the data, because it doesn't have to keep going back to the original Sequential Scan in this case.
Effectively, what Memoize does is that it speeds up the loading of the data, because it doesn't have to keep going back to the original Sequential Scan in this case.
This is a new Postgres 14 feature, and at the time, Lukas Eder wrote a blog post about this where he compared the performance of Memoize. He starts out by referencing a tweet that talks about Memoize in Postgres 14 making queries using JOIN LATERAL
a thousand times faster. We'll get back to LATERAL
in a moment.
Benchmarking Postgres Memoize
The most basic example Lukas shows is when we have two tables we're joining. One of them has an index and we're using a parameterized index scan. We need to make sure that we are on Postgres 14 or newer, and that the enable_memoize
setting, which is default on, has not been changed to off. When we do that, we should get a Memoize plan node, just as in Bruce's presentation.
Now, Lukas compares the same query when enable_memoize
is on, and when enable_memoize
is off. One interesting thing he realizes is that when he turns memoize off, he doesn't get a Nested Loop Join in this example query, but he gets either a Hash Join or a Merge Join. The important thing to understand is that Postgres costing doesn't just work on a per-plan node basis. Sometimes, the choice (for example in this case an Index Scan) is dependent on
- there being a Nested Loop
- he overall costing,
- it making sense that that Nested Loop of the Index Scan is cheaper than doing a Sequential Scan and then joining the data together.
So there's a very important difference between the plans, and when you are testing with Memoize yourself, be aware that it would only apply to cases where a Nested Loop Join makes sense in general and could be expected to perform better.
The simplest benchmark shows a little bit of performance improvement. The slower case with memoize off runs for 3.7 seconds, in the faster case it runs for 3.4 seconds. It's roughly a 10% speed up. If we had this consistently, that seems like a pretty useful thing!
Postgres Memoize and LATERAL joins
There is another case where this makes a huge difference: This will be very workload-dependent, but if you have a lateral join where you're doing more work than just joining the data itself together. In Lukas Eder’s example, he’s doing an aggregate, and effectively this aggregate would be executed five times, once for each "t" input value. We're doing the same calculation over and over again.
Memoize in this case can actually save us a bunch of time.
Instead of running it five times, it's able to run it just once. The result is pretty clear in this case, the version without the memoize enabled takes 3.4 seconds. The case with memoize enabled takes just one second. It's a huge performance improvement that you got from Memoize.
Be aware: This only works when you have LATERAL explicitly. If you rewrite the LATERAL subquery to be in a target list, this doesn't actually work.
Postgres Memoize in versions earlier than Postgres 14.4
Now, there's one thing I want to call out. If you see Memoize in your plan nodes and you're like "huh, maybe this Memoize is actually not faster". I did have an issue like this earlier last year, and it turns out that it was a bug in Postgres 14, before the patch release 14.4.
Before 14.4, there was a bug where Memoize would be used, but Memoize wasn't actually the best plan. The problem here was essentially in situations where there was Nested Loop inside a Nested Loop, the Memoize plan node was using the wrong reference data point to determine the cost of the Memoize to account for caching effects the right way. This is fixed in 14.4, thanks to David Rowley, who worked on the initial Memoize feature. Be aware, if you're, for some reason, not on the latest patch release of Postgres, make sure that you're at least on 14.4 to avoid these problems with the Memoize plans not being correct.
Thanks so much for joining us for episode 51 of 5mins of Postgres. Subscribe to our YouTube channel, sign up for our newsletter and follow us on LinkedIn and Twitter to get updates about new episodes!