Partitioning in Postgres and the risk of high partition counts
In E78 of “5mins of Postgres”, we're going to talk about partitioning in Postgres and how it can either kill performance or make it a lot better.
Let's have a look!
We'll start with this blog post by Hans-Jürgen Schönig on the Cybertech blog. In this article, Hans-Jürgen shows how you can get performance slowdowns by using partitioning.
Correct, partitioning isn't always a performance improvement. The test situation in Hans-Jürgens case is that we have a simple table called
t_simple. This table gets 10 million rows inserted into it. We also have an index on the integer column on that table and run a
Then, we do a very simple lookup. We're saying SELECT * from the
t_simple table WHERE value is such and such. If you have a look at the blog post, we can see that we're getting a very straightforward query plan that uses that index in a Bitmap index scan fashion to get the result. Now, what if we change this to use partitioning instead?
Following along with Hans-Jürgens article, we then use hash partitioning, which means that the value you're partitioning on, in this case the
ID column, gets hashed and you then have a certain number of positions that things get routed to. In this particular case we have eight partitions.
Now, we do the same data load. Luckily, you can insert it into the parent table, so you don't really need to think about the fact that there are partitions underneath it. Then we can create an index again, also in the parent table, that gets applied to all the sub partition tables. Now, we also run
EXPLAIN ANALYZE again, you can see that this query plan gets a lot more complicated. You can see on the top there's an append node, which means that Postgres will take the results of multiple plan nodes and merge them together into one. Then it will return that.
You’ll still get the same behavior underneath each of those. You can see that, ultimately, even though they are all really fast queries, the second query is actually slightly slower. If you have a look at the original blog post, what I would like to point out is that you should have a look at the execution time. You can see 0.235 milliseconds execution time, but 0.7 milliseconds planning time! This means you spend more time planning the query than executing it. You can also see that during planning Postgres had a hundred pages that it worked with in shared buffers, which are essentially the metadata about the structure of the tables, like which columns they have and such.
This is something to really be aware of: if you're using partitioning heavily on a database you will be introducing more overhead because you have more planning to do. In some extreme cases, either because of partitionings or because of having hundreds of indexes on a table, I've seen the planning time really be a problem.
In short: It's something to watch out for and to be conscious of when you run
Now I want to talk about another case of somebody using partitioning to their benefit. This article is by George Petropoulos from the engineering team at ChartMogul. What he is describing in it is how they changed the table partitioning scheme from lists to hash.
To start with, they already had partitioned tables. But they were using lists before and they switched to hash, the same way that Hans-Jürgen's post was referencing. Have a look at George’s blog post to see ChartMogul's architecture. You can see that they have multiple databases. They have an ingestion database, they have multiple data platform shards, and then they have an analytics database. What you can read about is that they've had a few issues with their current setup, specifically on their ingestion database. They had some IO problems, had some out of memory issues, and, you know, some overhead regarding how they manage these partitions.
On their old partitioning scheme they did one partition per customer or per account. This means they had a lot of partitions in that list partitioning scheme. What they had decided is that they were going to change from list to hash partitioning, and that they'll have 30 partitions to split up the data - with roughly 2.5 gigabytes per each of these hash partitions.
You can see in the blog post that this resulted in a clear improvement. We can imagine they probably have at least hundreds, if not thousands, of customers. So the more partitions you have, the more problematic the list partitioning scheme becomes. In this case, changing to hash partitioning essentially means a huge reduction in the number of tables, essentially sidestepping the exact overhead that Hans-Jürgen talked about.
For ChartMogul, they've used AWS Database Migration Service for the migration. You could probably implement something similar with logical replication in the new Postgres versions. But George’s team essentially used something that you could do today if you're on AWS: just migrate your own database in a way that is logical and is able to rewrite the partitioning.
The good news is they've essentially changed the architecture where they have split up some of the data into different shards, but then combined databases using the hash partitioning scheme to make it most efficient.
Their outcome of this is that they've seen a 5x improvement in simple
SELECT queries and a 3x improvement in
SELECT queries of
JOINs. They achieved these big performance improvements by improving their partitioning scheme! You can see, partitioning is always worth thinking about, especially if you have a lot of them. It's good to reduce them to keep them manageable in Postgres.