PostgreSQL ANY vs IN lists: Bind parameters and performance

By Lukas Fittl

November 24, 2022

Today, we look at the difference between PostgreSQL's ANY and IN operators, how they differ in performance, and why you might use one or the other.

Share this episode: Click here to share this episode on twitter, or sign up for our newsletter and check out the newsletter archive and subscribe to our YouTube channel.

PostgreSQL security: ANY operator vs. IN lists
- PostgreSQL's ANY operator and bind parameters
PostgreSQL performance: ANY operator vs. IN lists
- Allow hash lookup for IN clauses with many constants
Postgres index not used with = ANY() but is used with IN
What we have discussed in this episode of 5mins of Postgres

Transcript

Let's jump in!

PostgreSQL security: ANY operator vs. IN lists

We will start with this blog post by Matt Hudson on the Crunchy Data blog. Matt gives a good introduction of how IN lists are different from ANY lists. Here’s the scenario: He has a query which does a look up on the id column and he's passing in six different values. He wants to use the SQL query in his application and he’s using a Java application with the Vert.x Postgres library.

He first tries to get this query into the application by using this IN list, as he had initially in the query, and by using a $1 bind parameter. The goal here would be that you have one fixed query text and then as your user supplied data is changing, you're passing this as a bind parameter, avoiding any SQL injection problems. This is generally a good idea.

In Matt’s scenario he tries this and this fails. He gets an exception about the wrong number of parameters, because he's passing in six values, but there's only a single bind parameter in this list.

Next, instead of sending in a list of values, he’s sending in an array. This also fails, but with a different error. In this case, Postgres is complaining that the id column is an integer, but the $1 is an array - which is mismatching.

The IN list is essentially a list of values, which means that you could write something like $1, $2 as separate bind parameters, and then it would actually work and it would match up with your bind parameters. But there's a better way.

PostgreSQL's ANY operator and bind parameters

Matt comes to the ANY operator, which looks very similar to what Matt had going on thus far. Again, you're passing in $1, but instead of IN we're saying = ANY. What that means is if any of the $1 values in the $1 array are matched, then this is true. Otherwise, it's not true. This works and the bind parameter can be used successfully.

The main argument that Matt makes is that if you want to pass parameters securely, then it's a good idea to use = ANY, because if you use IN, you would have to generate the query text and that could cause a problem with potential SQL injections.

Additionally, ANY also has some other features. For example, you can do comparisons with LIKE or ILIKE, or NOT. So you don't just have = as an operator, you can be much more flexible.

There is a question though: Should I always be using one or the other?

PostgreSQL performance: ANY operator vs. IN lists

Back in 2017, Lukas Eder, who maintains the JOOQ ORM, described how somebody reached out to him and asked why his ORM is using IN lists and not ANY. They essentially said "This is slower, you should be using the faster version".

Lukas did a benchmark and the comparison he did was essentially the same thing that we just looked at in Matt’s scenario - but rather looking at it from a performance perspective instead of from a security perspective.

Lukas has two statements, both pretty simple, and in one case he's using an IN list, in the other case he's using an ANY with an array. He runs it in a PL/pgSQL function, which is a great way to make sure that we're testing the performance on the server side, not any kind of client interactions.

The results showed that when he was using a very short IN list, that in his particular benchmark, it was actually faster to have an IN list, rather than an ANY with an array. But once you get to a lot of values, in his case 128, the array actually performs measurably better than the IN list.

Once you get to a lot of values, in his case 128, the array actually performs measurably better than the IN list.

Allow hash lookup for IN clauses with many constants

When I was looking at this, I remembered that there was actually a change in Postgres 14 that improved this. In the release notes, if you read them very carefully, there's a change that says "allow hash look up for IN clauses with many constants". This is a patch by James Coleman that was committed by David Rowley. When you have a very long IN list it improves how efficient it is for Postgres to look up data in that IN list.

When James contributed this patch, two years ago, what he was showing was that when he's passing 1,000 random integers in an IN list, then, with this patch in place, it's about *10x faster because there was something really inefficient with how Postgres was looking things up.

I think an important conclusion here is: if you're not yet on Postgres 14 and you're using IN lists, either out of choice or because your ORM generates them, then Postgres 14 will actually give you a nice performance boost for these long IN lists.

If you're not yet on Postgres 14 and you're using IN lists, either out of choice or because your ORM generates them, then Postgres 14 will actually give you a nice performance boost for these long IN lists.

Postgres index not used with = ANY() but is used with IN

There is one other case that I want to mention. Because I usually like to use = ANY ARRAY, and there's one hazard that you have to watch out for, which is if you're using row type comparisons.

In this example, on the DBA stack exchange, somebody complained that their index was not being used when they were using = ANY, but it was being used with the IN operator. They were comparing a and b and were trying to match rows that had both of these equal to one of the reference values. In the case of = ANY, Postgres was doing a sequential scan. But in the case of IN, it was doing an index only scan!

Long story short, the reason that this happens is mostly for internal reasons, but you can fix it by adding an explicit cast to the particular type in the index.

Thanks so much for joining us for this episode of 5mins of Postgres. Subscribe to our YouTube channel, sign up for our newsletter and follow us on LinkedIn and Twitter to get updates about new episodes!