A better way to index your Postgres database: pganalyze Index Advisor

By Lukas Fittl

September 23, 2021

Screenshot of the new pganalyze Index Advisor

When you run an application with a relational database attached, you will no doubt have encountered this question: Which indexes should I create?

For some of us, indexing comes naturally, and B-tree, GIN and GIST are words of everyday use. And for some of us it’s more challenging to find out which index to create, taking a lot of time to get right. But what unites us is that creating and tweaking indexes is part of our job when we use a relational database such as Postgres in production. We need to get indexes right, in order to make sure our application performs well.

There are multiple ways to determine which indexes get used in your Postgres database. For example, you may choose to query the pg_stat_user_indexes table. There are Postgres extensions like HypoPG to try out hypothetical indexes on your database server. And some of us may decide to go ahead and simply index every column on every table.

But the reality nowadays is that modern apps are complex, and applications built on Postgres grow at an incredible pace. This makes indexing more important, but also more challenging than ever. As developers we want to focus on what matters, and not spend hours investigating which Postgres index to create.

At the beginning of this year we set out to improve the status quo for indexing with Postgres. And today, after many months of effort and having published an eBook about Index in Postgres, we’re excited to announce the new pganalyze Index Advisor for Postgres.

Before we dive into all the details, let’s take a step back and ask ourselves “How could we determine which index to create?”

Postgres Indexing: Is machine learning the answer?
How Postgres determines when to use an index
Creating the best Postgres index for your query
Review existing indexes with the Index Advisor
Try out the Index Advisor for free with the standalone tool
Automatic index advisor for your production queries in pganalyze
pganalyze Index Advisor and new pricing plans
Conclusion

Postgres Indexing: Is machine learning the answer?

It’s 2021, and of course we had to ask ourselves - is this a problem that requires ML and AI? Couldn’t we just train a model to create the right indexes for us?

We turned to GitHub CoPilot, the most sophisticated AI-based helper that exists today for developers, and asked it to create an index for a real world query in our own Postgres database:

Screenshot of GitHub CoPilot trying to recommend an index

Suffice to say that indexing like this is not effective. You will end up with significant overhead due to indexing almost everything, including columns that are not even referenced in the query.

Whilst this ML model will certainly improve, and there is research on more purpose-built solutions for databases, the point is: ML is not the magic solution we are looking for. We need more than just machine learning to know which indexes to create.

In fact, from our own experience, knowing which index to create does not require an ML model at all. Knowing how to create the best index can be done with a deterministic approach, that takes into account production database queries and schema statistics, and has a detailed understanding of how Postgres works.

And who knows best how Postgres works? Postgres itself!

How Postgres determines when to use an index

We started out by asking ourselves the question: How does Postgres decide which index to use? We can find this logic in the Postgres planner, which takes a parsed query and turns it into an execution plan.

Specifically, we decided to look at the function create_index_paths(..), where you can see that Postgres loops over all indexes on a particular table, and decides which indexes can be used:

void
create_index_paths(PlannerInfo *root, RelOptInfo *rel)
{
	...

	/* Skip the whole mess if no indexes */
	if (rel->indexlist == NIL)
		return;

	/* Bitmap paths are collected and then dealt with at the end */
	bitindexpaths = bitjoinpaths = joinorclauses = NIL;

	/* Examine each index in turn */
	foreach(lc, rel->indexlist)
	{
		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);

		/*
		 * Ignore partial indexes that do not match the query.
		 * (generate_bitmap_or_paths() might be able to do something with
		 * them, but that's of no concern here.)
		 */
		if (index->indpred != NIL && !index->predOK)
			continue;
   ...

Going into all this logic would likely fill multiple books, and it is based on decades of academic research. Cleary, Postgres is very sophisticated about determining which indexes can be used for a given query. Amongst the core decisions it makes are:

Does the index match the columns used in the query?
Does the query’s operator match the operator class of the index?
Does the index have a sort order that can be used by the query to avoid an explicit Sort step?
Does the query condition match a partial index condition?

And many other requirements and heuristics that need extensive knowledge of Postgres’ inner workings.

At pganalyze, we looked at this, and other functions, and we asked ourselves: What if we used the Postgres planner to tell us which index it would like to see, based on a given query?

That is, instead of asking “does this index match this query?”, we are asking “what’s the perfect index for this query?”. Perfect as in: ticks all the boxes in terms of operators/operator classes, columns and data types, and can be used to fulfill the query filter and join clauses of the query, if possible.

This logic based on the Postgres planner is the centerpiece of the new pganalyze Index Advisor. Our index advisor is available in the pganalyze app, but we also decided to provide a free, standalone version available to anyone.

Simply paste your query and schema data and get insights on whether existing indexes are useful, or learn why indexes you thought might help are ignored. Note that data uploaded to the standalone pganalyze Index Advisor stays local within your browser, unless you explicitly use the share functionality.

Going forward in this article, when you see examples and screenshots of the index advisor for Postgres, we are showing the public, standalone tool.

Creating the best Postgres index for your query

Let’s go back to our earlier example, and run it through the pganalyze Index Advisor:

pganalyze Index Advisor Example showing better recommendation than GitHub CoPilot

As you can see, we get a recommendation for a single multi column index that covers all columns that are in the WHERE clause, except for the column that’s inside the OR condition. This is the best index that we can create to ensure the query runs fast.

At launch the index advisor is focused on recommending B-tree indexes, with support for other index types coming soon.

Note that the index advisor also understands common query patterns like filtering out records based on deleted_at column, and recommends partial indexes for these queries:

Example of a partial index in pganalyze Index Advisor

Review existing indexes with the Index Advisor

The pganalyze Index Advisor is also able to determine how different existing indexes perform, to help you understand which index Postgres will most likely use.

For example, imagine a schema and index definition like this:

CREATE TABLE events(
  id bigserial PRIMARY KEY,
  created_at timestamptz,
  severity smallint,
  organization_id bigint,
  description text,
  details jsonb
);
CREATE INDEX ON events(organization_id, severity);

We want to understand how effective this index is for queries that only query the “severity” column, without looking up a particular organization.

With the index advisor, we can see the cost difference between the indexes, and that Postgres prefers using the single-column index in most situations:

Example of an existing index comparison in pganalyze Index Advisor

This can be explained by the fact that single-column indexes are usually smaller, and it’s more efficient, especially in older Postgres releases, to find index records when the queried column is listed first in the column list. You may still choose to use a multi-column index, but this helps you understand the trade-off.

Try out the Index Advisor for free with the standalone tool

Want to try out the index advisor yourself? As mentioned above, we developed a standalone version of the index advisor that runs fully in your web browser, powered by our self-contained Postgres planner compiled to WebAssembly.

You can simply go to https://pganalyze.com/index-advisor, paste your query and schema, and get your recommendations. If you don’t have a query and schema ready, for example because you are reading this on your mobile phone, you can take a look at how it works with a set of examples we added for your convenience.

Standalone pganalyze Index Advisor tool

We’ve also ensured that the standalone tool is ready for collaboration. If you want to share index recommendations with your team, simply click the [Share] button. After you confirm, this uploads the result of the index advisor to the pganalyze servers for sharing, and gives you a unique URL to share. Note that unless you share, all data stays local within your web browser.

Share function of standalone pganalyze Index Advisor tool

Of course, copying query texts can be tedious and a lot of work. But, if you are a pganalyze customer, we already have your query information in our app. The second part of today’s launch is about the new in-app pganalyze Index Advisor.

Automatic index advisor for your production queries in pganalyze

With the new Index Advisor in pganalyze, you can now see at a glance what index recommendations exist for each of your queries. You can simply go to the query details page for your queries, and see what the Index Advisor recommends:

In-app screenshot of pganalyze Index Advisor

This is really nice, but we already have work underway to help you get an even better assessment of index usage summarized across your whole database. But more on that soon (sign up for the newsletter if you want to get updates about this).

pganalyze Index Advisor and new pricing plans

The pganalyze Index Advisor represents a significant improvement to the core functionality of pganalyze, and introduces additional sophisticated processing for each query received by pganalyze. We are therefore taking this moment to introduce both a new Production and a new Scale plan. In addition to the Index Advisor, the new Scale plan also features SAML-based Single Sign On in early access, to integrate with identity providers such as Okta.

If you are an existing pganalyze customer on (what is now) a legacy plan you can try out the Index Advisor until the end of October 2021. Trying out the Index Advisor requires no changes to your existing pganalyze integration.

If you do not have an account with us at the moment but sign up for a new trial the pganalyze Index Advisor will be activated for your 14-day trial. Try it out today in the pganalyze app, or start a new trial.

Conclusion

All of us at pganalyze are excited to share the new pganalyze Index Advisor with you. Try out the standalone tool or explore the new in-app functionality today. We hope the standalone tool is a service you will come back to time and again and get value out of it. Feel free to bookmark it!

You can provide feedback through our dedicated discussion board on GitHub, or send us a support request for in-app functionality. We look forward to hearing from you.

If you want to share this article with your peers, feel free to tweet it.