How Table Access Methods in Postgres may offer a future without Bloat
In episode 77 of “5mins of Postgres”, we are going to take closer look at VACUUM and a potential future without bloat in your database thanks to table access methods. In detail, we are looking at Alexander Korotkov's OrioleDB.
- VACUUM and table bloat in Postgres
- A new way of “VACUUMing” your Postgres database
- No bloat with OrioleDB, undo logs as a design decision
- What are table access methods in Postgres?
- OrioleDB extension in Postgres 18?
- Similarities between table access methods and index access methods
- pganalyze VACUUM Advisor
- What we have discussed in this episode of 5mins of Postgres
Let's get to it!
I think it's worth talking about how the extensibility of Postgres allows such projects to function and maybe even see this feature in Postgres itself at some point in the future. Now, when we talk about VACUUM and bloat, what we really care about is this maintenance operation that you have to do on your Postgres database to get the best performance.
Postgres will already do this with autovacuum in many cases, but Alexander states that oftentimes VACUUM has been terrifying. I would argue that VACUUM has improved a lot over the last couple of releases. For example, the work that Peter Geoghegan did regarding both autovacuum and VACUUM behavior, as well as the B-Tree index improvements that oftentimes also then reduce the impact of bad vacuuming schedules.
However, I think it is true that VACUUM is still a problem today in Postgres. What OrioleDB tries to do is fundamentally change how Postgres does this. The design is very similar to what some other systems are doing with “undo logs”. The idea is: Instead of keeping dead rows around and having autovacuum clean them up later, you have a both per row, as well as per block, undo log. This allows for efficient handling of data changes. The idea is that this happens automatically behind the scenes. The main impact you're seeing is that the cleanup happens automatically whilst you're writing to the database instead of periodically through a process like autovacuum.
Alexander ran a benchmark of OrioleDB's storage engine compared to the regular Postgres heap storage. Firstly, I would say: let's always be careful with somebody who's trying to make a point and runs a benchmark. I'm sure there are cases where you'll find that OrioleDB will be slower. But, OrioleDB’s results are quite impressive.
He was testing an
INSERT into the table and then, on a conflict, did an update. The test aims to show the advantages of OrioleDB's design and in this particular benchmark, you can see that OrioleDB has about 4-5x transactions per second in terms of throughput, and It’s able to use the whole server CPU consistently.
The main point about bloat right at the very end of the blog post is that is shows the space used in this particular workload. If your autovacuum isn't tuned well, you will see Postgres' table size grow over time as the table gets more bloated. The argument here is that with OrioleDB that gets faster. You have no bloat and you have much less I/O as well for each transaction.
It's important to recognize that there is something interesting here. Postgres' MVCC design is tested, it works, but it does have its drawbacks. You can see that the undo logs actually do have a beneficial performance characteristic.
There's a couple of interesting aspects to this design, but before we dive into that, let's start with just seeing how Postgres enables you to do something as fundamental as changing the storage engine, without doing a lot of changes to the core database.
Generally speaking, Postgres has - since Postgres 12 - table access methods.
Table access methods are kind of like index access methods in the way that they allow you to customize a quite fundamental portion of the database. Unlike indexes, there isn't really much variety in terms of table access methods today. The main Postgres access method heap is one that I would say 99% of people use.
A popular table access method that I am familiar with is the Citus columnar extension. This uses the table access methods API to power columnar storage in Postgres. Instead of using row based storage, this uses columnar storage, but you can still query your Postgres and write to Postgres just as if it was a row based storage with a heap engine.
OrioleDB works in a similar way. OrioleDB is also aiming to be a table access method that's similar to the Citus Columnar extension. It aims to just be plugged into Postgres as an extension. I think this is really interesting because it allows you to choose one or the other in terms of how it fits your workload or your production requirements, but you don't necessarily have to switch your database.
The downside here is that OrioleDB is not yet there in terms of being a pure extension. They've published all of their work in open source, which I really appreciate. There are about 20 patches or so that Postgres and OrioleDB's version of Postgres are different in.
A lot of these are improvements to the table AM API. Some of them are additional hooks that are not there today. These are things that would need to be upstreamed in order for us to just go and to a simple
apt-get install postgresql and then
apt-get install orioledb, and then just "CREATE EXTENSION orioledb". Sadly, we're not there yet.
Alexander Korotkov has a very ambitious goal, which is to have the most important patches mentioned in Postgres 17, which is the version that's currently in development that will be released in about a year's time. He aims to get OrioleDB to be a pure extension in Postgres 18.
Alexander also elsewhere stated that his ultimate goal is that the heap storage engine in Postgres which powers your databases today will eventually be replaced by something like Oriole. That, too, is a very ambitious goal and I'm personally skeptical. But it's good to set your goals high. But, getting to a point where OrioleDB is an extension and you can just use it and install it, that's a very interesting premise and I think if we can get to that, that would be amazing.
One thing that I want to mention here about OrioleDB's design that I wasn't aware of before is that it is actually very intricately involved with how indexes are structured and how indexes maintain their versions of rows.
Because in Postgres today, even though we're talking about the table access method, the index access method also has to clean up dead tuples that are no longer existing. What somebody on Hacker News commented was that Oriole's design seems to require transaction-aware indexes. There is a risk here that individual writes are slower because of that transaction-aware index design. We'll have to see a lot more benchmarks to understand the specific trade-offs and to understand what we are losing by using a design like this.
On the note of VACUUM, the one last thing I want to mention is that in pganalyze we are also spending a lot of time thinking about VACUUM. We have a new feature called the pganalyze VACUUM Advisor that we've now realeased. You can read our launch blog post here. We've also recorded a webinar about advanced autovacuum tuning and the pganalyze VACUUM Advisor. Feel free to watch it here.