Postgres in 2021: An Observer's Year In Review

Every January, the pganalyze team takes time to sit down to reflect on the year gone by. Of course, we are thinking about pganalyze, our customers and how we can improve our product. But, more importantly, we always take a bird's-eye view at what has happened in our industry, and specifically in the Postgres community. As you can imagine: A lot!

So we thought: Instead of trying to summarize everything, let's review what happened with the Postgres project, and what is most exciting from our personal perspective. Coincidentally, a new Postgres Commitfest has just started, so it's the perfect time to read about new functionality that is being proposed by the PostgreSQL community.

The following are my own thoughts on the past year of Postgres, and a few of the things that I'm excited about looking ahead. Let's take a look:

Postgres: 2021 Year In Review

Postgres Performance: Sometimes it's the small things

To start with, I wanted to look at one very specific change that I actually hadn't noticed until recently.

Specifically: The performance of IN clauses, and the work done to improve performance for long IN lists in Postgres 14.

First, let's set up a test table with some data that we can query:

CREATE TABLE tbl (
    id int
);

INSERT INTO tbl SELECT i FROM generate_series(1,100000) n(i);

Now, we run a very simple query with a long IN list on Postgres 13:

postgres=# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]);
 count 
-------
  1000
(1 row)

Time: 360.520 ms

This is noticeably slow. With Postgres 14 however:

postgres=# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]);
 count 
-------
  1000
(1 row)

Time: 12.246 ms

An amazing 30x improvement! Note that this is most pronounced with Sequential Scans, or other situations where the executor makes a lot of comparisons, i.e. when the expression shows up as a Filter clause.

The reason I like this change is that it demonstrates what the Postgres community does well: Refine the existing system, and optimize clear inefficiencies, without requiring users to change their queries.

Of course, there are many other exciting performance efforts, here are a few:

Does autovacuum dream of 64-bit Transaction IDs?

Now, on to a much bigger topic. If you've scaled Postgres, you've likely come to meet the archenemy of a large Postgres installation: VACUUM, or rather its cousin, autovacuum, which cleans up dead tuples from your tables and advances the transaction ID horizon in Postgres.

Much has been said (1, 2, 3) about what happens when you hit Transaction ID (TXID) Wraparound, a situation in which Postgres is unable to start a new transaction. A recent blog post illustrating Notion's motivation to shard their Postgres deployment, puts it well:

More worrying was the prospect of transaction ID (TXID) wraparound, a safety mechanism in which Postgres would stop processing all writes to avoid clobbering existing data. Realizing that TXID wraparound would pose an existential threat to the product, our infrastructure team doubled down and got to work.

- Garrett Fidalgo - Herding elephants: Lessons learned from sharding Postgres at Notion

The root cause here is actually very simple. Transaction IDs are stored as 32-bit integers in Postgres. For example on individual rows in the table, to identify when the row first became visible to other transactions.

Most people would agree that moving from 32-bit to 64-bit Transaction IDs is a good idea. There have been multiple attempts over the years, but in the last weeks a new patch by Maxim Orlov has kickstarted a new discussion.

Whilst the community's motivation to fix this is certainly there, the early reviews give a glimpse of what needs to be considered when moving to 64-bit TXIDs:

  • 32-bit systems will have issues with atomic read/write of shared transaction ID variables
  • Extremely long-running transactions could fail if they exceed the new "short transaction ID" boundary (which remains at 32-bit in this patch)
  • On-disk format - keeping compatibility with the old format vs rewriting all data when an old cluster is upgraded (this patch tries to avoid changing the on-disk format)
  • Multixact freeze still needs to happen at a somewhat regular frequency (one of the activities that VACUUM takes care of today)
  • Memory overhead of larger 64-bit IDs in hot code paths (e.g. those optimized by recent connection scalability improvements)

And Peter Geoghegan puts it succinctly in reviewing the patch:

I believe that a good solution to the problem that this patch tries to solve needs to be more ambitious. I think that we need to return to first principles, rather than extending what we have already.

Despite the email thread being titled "Add 64-bit XIDs into PostgreSQL 15", given these concerns, it's extremely unlikely that a change like this would make it into Postgres 15 at this point - but one can dream, and look ahead to Postgres 16.

Looking for something you can use today?

Postgres 14 brought two great improvements in the area of VACUUM and bloat reduction: (1) The new bottom-up index deletion for B-tree indexes, (2) The new VACUUM "emergency mode" that provides better protection against impeding TXID Wraparound.

EXPLAIN: Nested Loops can be deceiving

Commitfests are about encouraging code reviews, first and foremost. Whilst looking through patches, I noticed a small one, which adds additional information about Nested Loops to EXPLAIN.

The patch was initially proposed back in 2020, and saw some minor refactorings in 2021, but no-one had reviewed it yet in this Commitfest. So I took the opportunity to review it.

First, to understand what the patch aims to do, let's look at a common EXPLAIN output for a Nested Loop:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop (actual rows=23 loops=1)
   Output: tbl1.col1, tprt.col1
   ->  Seq Scan on public.tbl1 (actual rows=5 loops=1)
         Output: tbl1.col1
   ->  Append (actual rows=5 loops=5)
         ->  Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5)
               Output: tprt_1.col1
               Index Cond: (tprt_1.col1 < tbl1.col1)
         ->  Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4)
               Output: tprt_2.col1
               Index Cond: (tprt_2.col1 < tbl1.col1)
         ->  Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2)
               Output: tprt_3.col1
               Index Cond: (tprt_3.col1 < tbl1.col1)
...

Based on this we might assume that each loop produces 5 rows, as the existing "actual rows" statistic shows the average across all loops.

But this example shows well where the math already doesn't add up: The parent Append node returns 5 rows on average, but the child node "actual rows" add up to 6. And the top Nested Loop node returns 23 rows, but we can't see clearly which index these rows are being found in.

With the patch in place, we get an extra row with Loop information:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop (actual rows=23 loops=1)
   Output: tbl1.col1, tprt.col1
   ->  Seq Scan on public.tbl1 (actual rows=5 loops=1)
         Output: tbl1.col1
   ->  Append (actual rows=5 loops=5)
         Loop Min Rows: 2  Max Rows: 6  Total Rows: 23
         ->  Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5)
               Loop Min Rows: 2  Max Rows: 2  Total Rows: 10
               Output: tprt_1.col1
               Index Cond: (tprt_1.col1 < tbl1.col1)
         ->  Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4)
               Loop Min Rows: 2  Max Rows: 3  Total Rows: 11
               Output: tprt_2.col1
               Index Cond: (tprt_2.col1 < tbl1.col1)
         ->  Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2)
               Loop Min Rows: 1  Max Rows: 1  Total Rows: 2
               Output: tprt_3.col1
               Index Cond: (tprt_3.col1 < tbl1.col1)
...

You can see how much clearer the picture is with this. We can understand that both tprt1_idx and tprt2_idx contributed about equally to the result. We can also see that some loop iterations have smaller row counts (2), vs other iterations have higher counts (6). When TIMING is turned on, you also get information on the min/max time of the loop iterations.

Given the prevalance of slow query plans that contain a Nested Loop, this appears to be a very useful patch. The main open item with this patch appears to be the slight overhead caused by collecting additional statistics - something to be discussed further on the mailinglist.

Interested in other EXPLAIN improvements? Here's what happened recently:

Extended Statistics: Help the Postgres planner do its job better

Going back to what you can use today: Extended Statistics on Expressions, released in Postgres 14.

Let's back up there for a moment. If you are not familiar, extended statistics allow you to collect additional statistics about table contents, so the Postgres planner can provide better query plans.

The general syntax is like this:

CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
    [ ( statistics_kind [, ... ] ) ]
    ON column_name, column_name [, ...]
    FROM table_name

Before Postgres 14 you could already create extended statistics that help the planner understand the correlation between two columns, which often times is necessary to avoid selectivity mis-estimates.

With the new extended statistics for expressions, you can inform the planner how selective a particular expression is, which in turn leads to better query plans. Here is an example of how to use this:

CREATE TABLE tbl (
    a timestamptz
);
CREATE STATISTICS st ON date_trunc('month', a) FROM tbl;

This will cause Postgres to not only collect statistics about a itself (which it does by default), but also the expression that uses the date_trunc function, and what the statistics of results of that expression are. You can find a complete example in the Postgres docs.

In addition to this, there are many changes in-flight that are being discussed:

Crustaceous Postgres: Using Rust For Extensions & more

A side topic that isn't actually about Postgres development itself, but still pretty exciting on a larger scale: Postgres and Rust. As you probably know, Postgres itself is written in C, and that is unlikely to change.

However there are two great examples of Rust being used to augment the Postgres ecosystem.

First, you can write Postgres extensions in Rust using pgx, and by now this approach has matured to the point that even established extension authors such as the TimescaleDB team have started adopting Rust for some of their projects, such as the TimescaleDB toolkit.

Second, there are new systems being developed that build on Postgres, that utilize Rust as their language of choice, e.g. for networked services. The most interesting development in 2021 in this regard is the work of the team at ZenithDB, that is working on an Apache 2.0-licensed variant of a shared disk-type scale-out architecture (similar to Amazon Aurora), built on Postgres, with services written in Rust.

Other highlights from Postgres development in 2021

A single post could never do everything justice, but here are a few more things that caught my attention:

And even more things that are pretty cool:

You may also enjoy looking at the current Commitfest, to make up your own mind.

Download Free eBook: How To Get 3x Faster Postgres

Conclusion

The above might feel quite extensive, but that's not merely all of the things that have happened with Postgres in 2021. I'm excited to be part of such a vibrant community contributing to making Postgres continuously better and am eager to see what's to come for Postgres in 2022!

At pganalyze we're committed to providing the best Postgres monitoring and observability to help you uncover deep insights about Postgres performance. Whether your Postgres runs in the cloud, your on-premises data center, or a Raspberry Pi: You can give pganalyze a try.

Share this on Twitter


Enjoy blog posts like this?

Get them once a month to your inbox