Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

Reducing table size with optimal column ordering and array-based storage

In today’s E100 of “5mins of Postgres” we discuss how to optimize the row size in Postgres to reduce the overall table size. Specifically, we take a look at how column ordering affects storage size, how you can play column tetris to optimize it, and the big benefits that array-based storage can yield for some workloads.

🎉 This is episode 100. And over the last, almost two years, it's been great every week to look at what's new with Postgres or what are good best practices that are still relevant today.
Thank you for watching 5mins of Postgres!


Share this episode: Click here to share this episode on LinkedIn or on X/Twitter. Feel free to sign up for our newsletter and subscribe to our YouTube channel.


Transcript

The optimal column order reduces size

We'll start with this post by Shaun Thomas on the 2ndquadrant blog. In this blog post Shaun describes how you can use column ordering in Postgres to optimize the amount of space a table takes on disk.

After that we'll also talk a bit about different storage techniques, such as using arrays to use TOAST that can help you make up to a 5x improvement of how much data your tables take on disk.

Now, let's start with using the pg_column_size function and manually constructing rows. The smallest possible row in Postgres is 24 bytes. That's just the tuple header without any additional information.

When I add a data point such a small int, you'll see that small int takes 2 bytes, so I have 26 bytes total. When I add a bigint, that's 8 bytes, so that's 32 bytes.

Now what's surprising, and this is really what Shaun's article here is about, is if I have a row where first I have a smallint, and then I have a bigint. You would expect it to be 10 bytes, right? We would expect it to be ultimately 34 bytes of total storage for this row. But we are actually getting 40 bytes.

What happens is that smallint takes 2 bytes, then there is 6 bytes of padding, and then there's another 8 bytes for that bigint, and this is how we end up with a 40 byte row.

Now, this of course gets worse if you have more complex, real life workloads, where your tables grow over time, people add new columns, and so you will actually end up having a mix of data types in your columns in a particular table.

Using pg_attribute and pg_type to determine size and alignment

Here, for example when we look at this internally, how Postgres stores this is that for example, here we have a boolean, then we have an int8 we've this mix of different types. For each of these types, Postgres stores information in its internal pg_attribute table with the typalign column and the typlen column.

typalign is essentially which alignment boundary it aligns on, and then the typlen is how long that type is in bytes. Or if it's a variable length field, then it would be -1. A text, for example, might take just a few bytes, might take a 100 bytes, it really depends on how much data is being stored.

Now, what Shaun illustrates here is that the ordering of your columns can have pretty drastic impact on the size of the table.

Going from 135 MB to 112 MB (20% savings!)

With this test case, we generate a bunch of data. In this case, just using the current ordering of the columns, we get a 135 MB table. That's the baseline, that's what we want to improve on.

Really, what we're trying to do here is we're trying to group things together so we don't have that padding that Postgres would add otherwise. I'll just pick one example here. Let's imagine you have two integers. Those are each 4 byte size and then a bigint is 8 byte in size. And so it makes more sense to do integer integer bigint instead of integer bigint integer.

Usually you want to have your smaller data types clustered together, and then your larger data types, following that. And so we can play column tetris to make that all fit together in an optimal form. That's what Shaun does here by ordering all the data types by their length.

With the optimized order we get down from 135 MB to 112 MB and so we already saved 20% of the space just by reordering columns, that's a really nice benefit.

Column Tetris and variable length types

I want to point out that the term column tetris was not coined by Shaun initially. Erwin Brandstetter had used this term many years ago on Stack Overflow, describing the exact same technique. And I'm sure other people have also invented that.

Erwin also had another good point where he responded to somebody reading Shaun's post, and clarifying something about how variable length fields are treated.

This is a bit of a complex story, but just to say: If you have a lot of variable length fields, like a text field, the best practice usually is to put them at the end. You put your fixed size types first, and then you have your text follow after that. But Erwin actually described that there are edge cases where it actually makes sense to do an ordering like integer text smallint.

He also references a good utility here, which is called pg_column_byte_packer.

The pg_column_byte_packer utility

If you're really trying to squeeze things and you don't want to do this work manually, James Coleman and the team over at Braintree, maintain pg_column_byte_packer, which is intended for use with Ruby on Rails in particular. And so you can use this to automatically optimize your columns in Active Record migrations. This is mostly useful when creating new migrations or also optionally it can re-sort an existing structure.

Sometimes there are larger improvements to be had by adjusting your data structure.

Using arrays for avoiding tuple header overhead

I want to briefly talk about a post by Joe Conway from a couple of years back.

In this blog post, Joe talks about a use case where there is two different tables. One table is storing the parents and then there's multiple details for each of those parent records.

What Joe explores in this post is an alternate storage of that structure. He starts out with 1.1 GB, and then he tries different variants. And I think the most important thing that he tries here, which I found really interesting over the years and we've recently had great success with this at pganalyze, is array based storage.

Instead of having multiple records, each of them with a data point. **What if we can combine them into an array and we store an array in Postgres. And so in this particular example, Joe had a 13% reduction in storage by moving from the one row per value approach to storing this as an array.

Sometimes you could actually get even bigger benefits.

An example on our own database at pganalyze

This array-based storage approach, we've recently used for time series data, and there are other ways to store time series data. But I want to show you an example where using an array to store data is really great. We'll talk more about this in another 5mins episode or a blog post. But just a quick example here from our own database.

So we have this information we store in our system for query stats, and we recently made an effort to use arrays for this. Historically the way we stored this, is we said, here's a particular query fingerprint, and here's how many calls that fingerprint had at a particular point in time:

CREATE TABLE public.query_stats_35d (
    ...
    fingerprint bytea NOT NULL,
    collected_at timestamp without time zone NOT NULL,
    collected_interval_secs smallint NOT NULL,
    calls bigint,
    total_time double precision,
    ...
)
PARTITION BY RANGE (collected_at);

This, for one day worth of data in the current system is about 112 gigabytes of data. Now there is a lot more efficient ways to do this, and so what we ended up doing is using arrays for this. For a particular point in time, let's put multiple query statistics together in one. Instead of just storing one row being one value, we actually say each row is many values.

CREATE TABLE public.query_stats_packed_35d (
    ...
    fingerprint bigint[] NOT NULL,
    collected_at timestamp without time zone[] NOT NULL,
    collected_interval_secs smallint[] NOT NULL,
    calls bigint[] NOT NULL,
    total_time double precision[] NOT NULL,
    ...
)
PARTITION BY RANGE (min_collected_at);

And so the idea here is that the information becomes much more packed. This packed array style format actually had a drastic reduction in size.

Arrays + TOAST Compression = 5x improvement!

And so for us, we went from 112 GB of data, plus TOAST to 20 GB of data and TOAST for the exact same data. A 5x improvement. And really the difference here is that when you use arrays in Postgres and there are long arrays and there's duplicate data in this, Postgres will move them to TOAST and compress them.

And so if you have this type of data, where you have very similar values that are being stored a lot. Sometimes it's worth exploring this array based storage technique. It lets you pack data very efficiently without requiring any extensions in standard Postgres. So, great opportunity in some workloads.

For all the workloads you could definitely benefit from thinking about column alignment. And then for some workloads using array based storage will get you up to a 5x improvement.

I hope you learned something new from E100 of 5mins of Postgres. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn and X/Twitter to get updates about new episodes!

What we have discussed in this episode of 5mins of Postgres


Enjoy blog posts like this?

Get them once a month to your inbox