5mins of Postgres E45: IN lists vs ANY operator: Bind parameters and performance
Today, we're going to talk about the difference between the ANY and the IN operator in Postgres, how they differ in performance, and why you might use one or the other.
Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and check out the newsletter archive and subscribe to our YouTube channel.
Let's jump in!
Postgres security: IN lists vs. ANY operator
We will start with this blog post by Matt Hudson on the Crunchy Data blog. Matt gives a good introduction of how
IN lists are different from
ANY lists. Here’s the scenario: He has a query which does a look up on the
id column and he's passing in six different values. He wants to use the SQL query in his application and he’s using a Java application with the Vert.x Postgres library.
He first tries to get this query into the application by using this
IN list, as he had initially in the query, and by using a
$1 bind parameter. The goal here would be that you have one fixed query text and then as your user supplied data is changing, you're passing this as a bind parameter, avoiding any SQL injection problems. This is generally a good idea.
In Matt’s scenario he tries this and this fails. He gets an exception about the wrong number of parameters, because he's passing in six values, but there's only a single bind parameter in this list.
Next, instead of sending in a list of values, he’s sending in an array. This also fails, but with a different error. In this case, Postgres is complaining that the
id column is an integer, but the
$1 is an array - which is mismatching.
IN list is essentially a list of values, which means that you could write something like
$2 as separate bind parameters, and then it would actually work and it would match up with your bind parameters. But there's a better way.
Using the ANY operator and bind parameters
Matt comes to the ANY operator, which looks very similar to what Matt had going on thus far. Again, you're passing in
$1, but instead of
IN we're saying
= ANY. What that means is if any of the
$1 values in the
$1 array are matched, then this is true. Otherwise, it's not true. This works and the bind parameter can be used successfully.
The main argument that Matt makes is that if you want to pass parameters securely, then it's a good idea to use
= ANY, because if you use
IN, you would have to generate the query text and that could cause a problem with potential SQL injections.
ANY also has some other features. For example, you can do comparisons with
NOT. So you don't just have
= as an operator, you can be much more flexible.
There is a question though: Should I always be using one or the other?
Postgres performance: IN lists vs. ANY operator
Back in 2017, Lukas Eder, who maintains the JOOQ ORM, described how somebody reached out to him and asked why his ORM is using IN lists and not ANY. They essentially said "This is slower, you should be using the faster version".
Lukas did a benchmark and the comparison he did was essentially the same thing that we just looked at in Matt’s scenario - but rather looking at it from a performance perspective instead of from a security perspective.
Lukas has two statements, both pretty simple, and in one case he's using an
IN list, in the other case he's using an
ANY with an array. He runs it in a PL/pgSQL function, which is a great way to make sure that we're testing the performance on the server side, not any kind of client interactions.
The results showed that when he was using a very short IN list, that in his particular benchmark, it was actually faster to have an
IN list, rather than an
ANY with an array. But once you get to a lot of values, in his case 128, the array actually performs measurably better than the
Once you get to a lot of values, in his case 128, the array actually performs measurably better than the IN list.
Allow hash lookup for IN clauses with many constants
When I was looking at this, I remembered that there was actually a change in Postgres 14 that improved this. In the release notes, if you read them very carefully, there's a change that says "allow hash look up for IN clauses with many constants". This is a patch by James Coleman that was committed by David Rowley. When you have a very long IN list it improves how efficient it is for Postgres to look up data in that
When James contributed this patch, two years ago, what he was showing was that when he's passing 1,000 random integers in an
IN list, then, with this patch in place, it's about *10x faster because there was something really inefficient with how Postgres was looking things up.
I think an important conclusion here is: if you're not yet on Postgres 14 and you're using
IN lists, either out of choice or because your ORM generates them, then Postgres 14 will actually give you a nice performance boost for these long
If you're not yet on Postgres 14 and you're using
INlists, either out of choice or because your ORM generates them, then Postgres 14 will actually give you a nice performance boost for these long
Postgres index not used with = ANY() but is used with IN
There is one other case that I want to mention. Because I usually like to use
= ANY ARRAY, and there's one hazard that you have to watch out for, which is if you're using row type comparisons.
In this example, on the DBA stack exchange, somebody complained that their index was not being used when they were using
= ANY, but it was being used with the
IN operator. They were comparing
b and were trying to match rows that had both of these equal to one of the reference values. In the case of
= ANY, Postgres was doing a sequential scan. But in the case of
IN, it was doing an index only scan!
Long story short, the reason that this happens is mostly for internal reasons, but you can fix it by adding an explicit cast to the particular type in the index.
Thanks so much for joining us for this episode of 5mins of Postgres. Subscribe to our YouTube channel, sign up for our monthly newsletter and follow us on Twitter to get updates about new episodes!
What we have discussed in this episode of 5mins of Postgres
Postgres Query Boost: Using ANY Instead of IN - by Matt Hudson
SQL IN Predicate: With IN List or With Array? Which is Faster? - by Lukas Eder
Postgres 14 patch - “allow hash look up for IN clauses with many constants”
Index not used with = ANY() but used with IN - on StackExchange