Introducing pganalyze Index Advisor 3.0 - A workload-aware system for finding missing indexes in Postgres
Indexing your database across a diverse set of queries across an entire table can be a challenge. You need to determine what indexes might be missing, how new indexes will interact with existing ones, and the overall impact on the specific workload. Continuously adapting indexes is difficult and time consuming; you might deal with a large number of tables, queries, and existing indexes, each with its own characteristics. For example, balancing the need for indexes against the potential downsides, such as Index Write Overhead and preventing Postgres HOT updates.
Did you know that depending on the workload there could be hundreds of possible index combinations on a given table? How would you go about testing and reasoning about such a large number of options manually?
Almost 3 years ago we first set out to build an automated system for analyzing a Postgres workload, and finding missing indexes. We started with insights on individual queries, later extended it to multiple queries, and now, a set of index recommendations for a whole table.
Today, we’re excited to introduce version 3.0 of the pganalyze Index Advisor. The new Index Advisor features our new constraint programming model for Index Selection, workload-specific configuration depending on the number of reads and writes on a given table, and a new user interface that emphasizes the ability to understand and debug why Index Advisor recommends creating a particular index.
We’re also introducing additional Postgres-specific functionality, such as avoiding indexes on columns with UPDATEs to enable Heap-Only Tuples (HOT), and the ability for Index Advisor to detect functional dependencies, utilizing Postgres’ extended statistics information.
At a high level: How pganalyze Index Advisor 3.0 works
pganalyze Index Advisor continuously monitors the database workload and collects information about query execution, including which queries are being run most frequently and which queries are experiencing performance issues. It then analyzes this data to identify opportunities for creating or modifying indexes that could improve query performance.
In order to determine which indexes could be utilized for a given query, Index Advisor follows a 3 step process:
- Breaking down queries into scans (WHERE and JOIN conditions)
- Calculating indexes that could be useful for a given set of WHERE/JOIN conditions, and determining the scan cost for each of them (using the Postgres planner inside the pganalyze app)
- Selecting indexes based on a configuration specific to the table’s workload, using the new constraint programming model
Amongst other changes, in this release we've made major revisions to that third step, going from a simple greedy set covering algorithm, to a customizable constraint programming model that uses the CP-SAT solver for finding the optimal set of indexes.
Let's take a closer look at what's new!
Tailored recommendations specific to each table's workload
With the new per-table recommendations, Index Advisor provides a clear understanding of potential impacts on the overall workload, making it much easier to optimize your database effectively.
Holistic table insights:
We've completed the shift from individual index insights to recommendations that consider your entire workload for a particular table. Rather than having to evaluate insights one by one, you now get a new view that tells the complete story. Browse through all the details relating to indexes and scans, so you can evaluate our recommendations more easily.
We do not stop here, but also show you what indexes were considered for any given scan to allow you to challenge our insights or test your own assumptions.
Tailored to your workloads:
Whether your tables are leaning towards read-heavy or write-heavy workload patterns, the updated Index Advisor can fine-tune its recommendations to suit different constraints. It automatically detects the nature of your workloads and adjusts recommendations accordingly, ensuring the best compromise between performance and resource utilization.
Previously, the Index Advisor model was leaning towards giving more missing index insights, because it was more tuned towards read-optimized workloads.
Fine-tuned insights using the new Constraint Programming Model
This new experience is driven by the pganalyze Indexing Engine. We evolved the initial "Good Enough" index selection algorithm to a customizable Constraint Programming model. Indexes are now suggested based on user-defined goals and rules. You may want to cover as many scans as possible with the available indexes, or to make the costs of the scans as small as possible, or to stay within a given index budget, and so on.
These goals and rules can also be combined—for example, covering as many scans as possible whilst staying within a budget of a specific number of indexes. Once the model is given a set of goals and rules, it then suggests the indexes that best fit with these requirements based on the available data.
To simplify the Index Advisor experience, we have introduced three configuration presets:
- Read-optimized (more indexes, lower scan costs)
- Write-optimized (fewer indexes, higher scan costs)
- Balanced (balance between the other two)
For example, by default when a table has more than 60 writes per minute we consider the table to be write-heavy, and optimize for write performance in Index Advisor. The thresholds can be customized in pganalyze if you want to do so.
For those of you who like to dig in deeper, the underlying model is fully documented and customizable. You have the freedom to experiment and adjust the recommendations to perfectly align with your unique requirements.
We will follow up with a more detailed blog post about the new model, in the meantime you can watch a recording of our webinar where we explain it as well.
Drill down into the model's decision making
As we were developing the new model, we discovered that one of the most important aspects is the ability to debug a particular set of index recommendations. Whilst we had previously used an internal format to see this when needed, it quickly became apparent that we needed a better way to explore the underlying data that drives the choice of a given index.
With the new Index Advisor, pganalyze now gives you direct access to the internal information that is used for assessing different indexes. We can clearly see the estimated cost for a given scan/index combination, which indexes were considered, and which indexes were deemed not a good fit due to high Index Write Overhead.
This new experience also allows assessing which queries could potentially use an existing index, to help you assess how removing an index might impact your query workload.
Index Consolidation, or: When to Remove an Index
With the updated model, we now also have the ability to suggest how to consolidate existing indexes, so you can achieve similar read performance with a smaller number of indexes. This can also help you answer questions like "What if I wanted to start from a clean slate on this table?"
This part of Index Advisor is currently still in early access, as we refine the user experience. If you’re interested in trying this out, you can contact us and we’d be happy to discuss it further with you.
More in-depth analysis with Postgres-specific funtionality
As we’ve been working closely with customers on this new release, we’ve discovered two important functionalities that are now in the final release as well:
Postgres HOT updates
When you add an index, besides considering the direct write overhead Postgres has with adding the index, you also have to consider the Heap-Only Tuple (HOT) optimization in Postgres. For workloads that issue a lot of UPDATE statements, it is important to avoid indexing columns that are frequently updated. If even a single index exists on the column being updated, the UPDATE incurs significantly higher write overhead and additional autovacuum effort. The new Index Advisor now considers this overhead explicitly, and will avoid recommending indexes on updated columns if possible.
Functional dependencies and CREATE STATISTICS
For example, imagine you have an "zipcode" and a "state" column. If you query for both of these columns in the same WHERE clause, Index Advisor might have previously suggested creating a multi-column index, to help narrow down the search space. But the truth is that in such a situation an index on "zipcode" is sufficient, since a given US ZIP code correlates with a given US state. This is called a functional dependency.
Postgres by default does not collect enough information about these dependencies, and this can cause bad query plans to be selected, since Postgres will mis-estimate rows. You can have Postgres detect this using the CREATE STATISTICS
command, which collects additional "extended" statistics. If extended statistics are created, pganalyze will now pick them up and use them as part of determining costs in Index Advisor.
In summary
Our overall goal with pganalyze Index Advisor 3.0 is to help you craft a holistic index strategy that overcomes one of the biggest database optimization challenges, ensuring your database runs smoothly and efficiently at scale. By continuously analyzing your queries and workload patterns for all your tables, we will look for the best indexing strategy for you, so you can focus on other performance improvement challenges and work on customer facing features.
Thanks to the new Indexing Engine model, you can now play with different settings to understand why certain choices have been made, and reason about them, before applying the insights to your database. We aim to make the recommendation process more transparent, allowing you to see which indexes were considered for each query.
We are excited about the new capabilities, and can’t wait to see what you think and how you will use it to inform your indexing decisions.
You can start using pganalyze Index Advisor 3.0 today on all our current plans, with an Enterprise Server release coming in a few weeks. If you are new to pganalyze, you can try out the full feature set, including pganalyze Index Advisor 3.0, for 14 days free of charge.