5mins of Postgres E39: Postgres 15 release, and 6 tips for managing indexes
Today, we're going to talk about the Postgres 15 release, and six findings to improve how you manage your Postgres indexes.
- Each individual index has storage and write costs
- Postgres can choose different ways to query indexed data
- Combined indexes versus multi-column indexes in Postgres
- Column placement in multi-column indexes
- Use sorted indexes to improve Postgres query performance
- Understand the query planner with Postgres EXPLAIN
Let's jump in!
So, first of all, in the words of Postgres committer Tom Lane: "Drat, forgot this …".
Postgres 15 has been tagged and is released today, go take a look and try it out! This has been through a couple of betas and two RC releases. But, of course it’s always a good idea to test your application in a staging environment first before migrating production.
Besides the Postgres 15 release, today I want to talk about six findings that the team at Knock app relies on when they manage their Postgres indexes.
I like this blog post by Billy from the Knock team because it gives a good experience of what the typical application team would run into when they manage their Postgres indexes. Some parts of this article might seem like basic knowledge, but I do like a good overview and I think they found a good balance here.
You can consider an index in Postgres as a cache, or a secondary index, which means that Postgres will actually always create a separate data structure. That means each index you create ends up taking more space on disk, every table write requires companion writes to the indexes, and the more indexes you have, the more expensive it is to maintain them. So be careful not to over-index your tables.
Each index has storage and write costs. [...] The more indexes you have, the more expensive it is to maintain them.
Postgres can choose many ways to query indexed data. Usually, an indexed read will be faster, but if you have a very small table Postgres will use a sequential scan instead of an index scan. As Billy notes in his blog post: when Postgres looks at data, it works in fixed size pages, so when you see the notion of a page or buffer that refers to the eight kilobytes page size in Postgres.
Now, Postgres will have different ways of pulling this data from your tables:
Sequential scan: doesn't use any indexes, just goes directly to the table data.
Index-Only Scan: will almost always look at the index itself and get the data. There are some cases where it can still go back to the table, but usually it just uses the index.
Index scans: uses both the index and the table to get some data, but the index defines which rows are matching.
Bitmap index scan: this is the case where Postgres builds a bitmap from the index and using that bitmap it locates which pages in the table to look at. This can also be used when Postgres combines multiple indexes.
Certain index types will always have Bitmap Index Scans. The best example of this would be BRIN, which we talked about last week. BRIN always causes a bitmap index scan because BRIN is a fuzzy index, it doesn't contain exact information about which rows to look at, so the bitmap will be very broad in terms of which parts of the table Postgres has to scan to find the actual rows.
With combined indexes you can think about having an index that's on "column_a" and a separate index that's on "column_b" and Postgres chooses to do a bitmap operation to combine these two indexes. Postgres often chooses to not do that, which is why it does make sense to combine indexes into a multi column index, not just for avoiding multiple indexes, but really to optimize the index cost.
If you do a multi column index it's important to place the column that you're most frequently querying by early on. Let's say you have a query that is on "A", "B" and "C", the query would be happily using an index that's defined as "C", "B", "A".
But, if you have a query that only references "C" and your index starts with "A" and "B", then there's a lot of unrelated data and the sorting is incorrect. Postgres will not be able to efficiently use that index, usually it wouldn't use the index at all, based on how the planner makes the decisions.
If you do a multi column index it's important to place the column that you're most frequently querying by early on.
It's important to think about sorting when you create indexes, especially if you have a lot of
ORDER BY statements. In Postgres, when you do
CREATE INDEX, you can specify the order for each column. For example, we can specify if a column should be stored in descending order. That's usually not as important because Postgres can query an index backwards.
However, if you do use
NULLS FIRST or if you have multiple column orders, for example one of the columns should be ascending and the other column should be descending, then you should always specify this explicitly matching the queries.
You should work in harmony with the query planner. Postgres will only use your indexes if the query planner chooses to do so. Postgres gives you a very powerful tool with EXPLAIN to understand the query planner better. It is always important to verify with EXPLAIN whether indexes are actually used.
For example, when we at pganalyze worked on our Postgres index recommendation feature, we worked closely with the planner source code, because for us, that was the source of truth. If the planner didn't end up choosing an index it didn't make sense. That's what I personally always come back to when I tune an index: I always run an EXPLAIN command to verify.