5mins of Postgres E14: HOT Updates vs Bottom-Up Index Deletion in Postgres 14
Today, we're going to talk about bottom-up index deletion for B-tree indexes, Postgres pages, Postgres page splits in B-Tree indexes, the bt_page_items function, and how Bottom-Up Index Deletions compare to HOT updates.
Let's have a look!
First of all, let's take a look at this blog post by Hamid Akthar from the Percona team. In this blog post, Hamid compared Postgres 13 with Postgres 14. Before we jump into that, let's do a quick primer on Postgres pages.
The Postgres page size is 8kb. That means that each time you're reading something from a table or an index, you're doing that in 8kb chunks. These optimizations that we're talking about all work on a per page level, and that makes them essentially free from a write overhead or locking perspective because the pages are already locked and they're already being written. That's why it's easy to add in new optimizations.
The other important thing to understand is that the motivation for this is to avoid page splits. A page split in a B-tree index occurs when you're adding a new index entry, but there's not enough space in the index page you're trying to add it to, so you need to create a new index page.
Then, because you're not always adding things at the end of the page, you may have to move multiple values to that new index page, causing a lot of potential updates to parent index pages. Those can be pretty expensive because you're not just updating 8 kilobytes anymore, you might be updating a multiple of that.
Bottom-up index deletion helps us avoid such page splits and ultimately reduces the index size as well. In a test Hamid did, he created a table called "foo" and then created two indexes on that table. It's actually really important that to test this, we create two indexes. The reason for that is that this optimization works on the basis that you're updating a column that's indexed by one index, but not by the other.
We have a thousand rows and then we're updating all rows on the value column four times. If you just compare the index size, you can see that on Postgres 13 the index size is a little bit over a hundred kilobytes versus on Postgres 14 the index size is 88 kilobytes. You can see already, we are avoiding a little bit of bloat here after all these updates!
When we use the
bt_page_items function to look at the actual index contents, we see that on Postgres 13, what happens is you have the initial index entry, and then you have four more index entries that get created on top of that. Now in Postgres 14, we can see that there are only two index entries here because the bottom-up index deletion cleaned things up for us, as we did those updates.
If you want to understand better how this works, we can take a look at the Postgres documentation. As you read through this B-tree indexes chapter it really helps to read every word twice, and question your understanding.
I certainly had to do a lot of re-reading here to understand what was meant. On the bottom-up index deletion, it first talks about this concept of version churn. Version churn occurs in the case where you have an
UPDATE-heavy workload and where the updates cannot apply the
HOT optimization. We're going to talk more about
HOT in a moment.
HOT cannot occur, then it means that you're changing the value of a column, that column is covered by one index, and then because we're changing an indexed column, we also have to update all the other indexes even if those indexes do not include that column. We have to create a new entry in each and every index on that table. That can be pretty expensive because you're suddenly creating all these extra index entries in all the indexes!
The way that the bottom-up index deletion works is that, as you are doing
UPDATEs, when an
UPDATE is triggered, it tries to predict a version churn page split, a page split that occurs because of these unnecessary index entries, then it triggers this deletion pass, and this deletion pass is like an early vacuum, you could say, that happens on just that page. The goal here is to remove the index entries that are no longer visible,
VACUUM would have also removed them when it ran, but you're doing that ahead of time, and because you're doing it ahead of time, you're making space for new
UPDATEs to occur.
Contrasting that to
HOT updates or heap only tuple updates in Postgres are a mechanism to also avoid this overhead of having an
UPDATE-heavy workload. In this blog post by Laurenz Albe from the Cybertec team, he describes in more detail how that works.
Generally, important to understand: an
UPDATE in Postgres is not much different from a
DELETE followed by an
INSERT. You're creating a new row version by having an
The problem that happens is that when you do an
UPDATE, you add new index entries and that has to happen because you need to make sure that the new table row version is correctly being pointed to by the index entries.
The way that
HOT works is it utilizes an implementation detail, where each page in Postgres keeps a list of line pointers. That's essentially saying: here's all the data I have in this page and it's not the actual values, it's just where in the page the data is. This approach helps Postgres be able to rearrange the contents of the page.
What the heap only tuple does is it creates what's called a
HOT chain. When an index scan has to look at a
HOT chain item it can follow a chain to get to the actual data. It does that in a way where Postgres still retains the ability to rearrange the page.
That only works if a particular value is not referenced by any index. Once an index actually references the specific values, then you have an outside reference into the page and so because of that,
HOT is no longer applicable.
That's really the important difference between when
HOT applies versus when bottom-up index deletion applies.
HOT only applies when there is no index defined on any column whose value is being modified as part of the update, versus bottom-up index deletion also works in cases where you're updating one of the indexes, but not the other index.