Postgres in 2021: An Observer's Year In Review
Every January, the pganalyze team takes time to sit down to reflect on the year gone by. Of course, we are thinking about pganalyze, our customers and how we can improve our product. But, more importantly, we always take a bird's-eye view at what has happened in our industry, and specifically in the Postgres community. As you can imagine: A lot!
So we thought: Instead of trying to summarize everything, let's review what happened with the Postgres project, and what is most exciting from our personal perspective. Coincidentally, a new Postgres Commitfest has just started, so it's the perfect time to read about new functionality that is being proposed by the PostgreSQL community.
The following are my own thoughts on the past year of Postgres, and a few of the things that I'm excited about looking ahead. Let's take a look:
- Postgres Performance: Sometimes it's the small things
- Does autovacuum dream of 64-bit Transaction IDs?
- EXPLAIN: Nested Loops can be deceiving
- Extended Statistics: Help the Postgres planner do its job better
- Crustaceous Postgres: Using Rust For Extensions & more
- Other highlights from Postgres development in 2021
Postgres Performance: Sometimes it's the small things
To start with, I wanted to look at one very specific change that I actually hadn't noticed until recently.
Specifically: The performance of
IN clauses, and the work done to improve performance for long
IN lists in Postgres 14.
First, let's set up a test table with some data that we can query:
CREATE TABLE tbl ( id int ); INSERT INTO tbl SELECT i FROM generate_series(1,100000) n(i);
Now, we run a very simple query with a long
IN list on Postgres 13:
postgres=# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]); count ------- 1000 (1 row) Time: 360.520 ms
This is noticeably slow. With Postgres 14 however:
postgres=# SELECT count(*) FROM tbl WHERE id IN ([... 1000 integer values ...]); count ------- 1000 (1 row) Time: 12.246 ms
An amazing 30x improvement! Note that this is most pronounced with Sequential Scans, or other situations where the executor makes a lot of comparisons, i.e. when the expression shows up as a
The reason I like this change is that it demonstrates what the Postgres community does well: Refine the existing system, and optimize clear inefficiencies, without requiring users to change their queries.
Of course, there are many other exciting performance efforts, here are a few:
- Postgres 14: Connection scaling improvements
- Postgres 14: Memoization of Nested Loops
- Postgres 14: libpq pipelining
- In Development: libpq compression
- In Development: Asynchronous I/O and direct I/O (see also this presentation by Andres Freund)
Does autovacuum dream of 64-bit Transaction IDs?
Now, on to a much bigger topic. If you've scaled Postgres, you've likely come to meet the archenemy of a large Postgres installation: VACUUM, or rather its cousin, autovacuum, which cleans up dead tuples from your tables and advances the transaction ID horizon in Postgres.
Much has been said (1, 2, 3) about what happens when you hit Transaction ID (TXID) Wraparound, a situation in which Postgres is unable to start a new transaction. A recent blog post illustrating Notion's motivation to shard their Postgres deployment, puts it well:
More worrying was the prospect of transaction ID (TXID) wraparound, a safety mechanism in which Postgres would stop processing all writes to avoid clobbering existing data. Realizing that TXID wraparound would pose an existential threat to the product, our infrastructure team doubled down and got to work.
- Garrett Fidalgo - Herding elephants: Lessons learned from sharding Postgres at Notion
The root cause here is actually very simple. Transaction IDs are stored as 32-bit integers in Postgres. For example on individual rows in the table, to identify when the row first became visible to other transactions.
Most people would agree that moving from 32-bit to 64-bit Transaction IDs is a good idea. There have been multiple attempts over the years, but in the last weeks a new patch by Maxim Orlov has kickstarted a new discussion.
Whilst the community's motivation to fix this is certainly there, the early reviews give a glimpse of what needs to be considered when moving to 64-bit TXIDs:
- 32-bit systems will have issues with atomic read/write of shared transaction ID variables
- Extremely long-running transactions could fail if they exceed the new "short transaction ID" boundary (which remains at 32-bit in this patch)
- On-disk format - keeping compatibility with the old format vs rewriting all data when an old cluster is upgraded (this patch tries to avoid changing the on-disk format)
- Multixact freeze still needs to happen at a somewhat regular frequency (one of the activities that VACUUM takes care of today)
- Memory overhead of larger 64-bit IDs in hot code paths (e.g. those optimized by recent connection scalability improvements)
And Peter Geoghegan puts it succinctly in reviewing the patch:
I believe that a good solution to the problem that this patch tries to solve needs to be more ambitious. I think that we need to return to first principles, rather than extending what we have already.
Despite the email thread being titled "Add 64-bit XIDs into PostgreSQL 15", given these concerns, it's extremely unlikely that a change like this would make it into Postgres 15 at this point - but one can dream, and look ahead to Postgres 16.
Looking for something you can use today?
Postgres 14 brought two great improvements in the area of VACUUM and bloat reduction: (1) The new bottom-up index deletion for B-tree indexes, (2) The new VACUUM "emergency mode" that provides better protection against impeding TXID Wraparound.
EXPLAIN: Nested Loops can be deceiving
Commitfests are about encouraging code reviews, first and foremost. Whilst looking through patches, I noticed a small one, which adds additional information about Nested Loops to EXPLAIN.
The patch was initially proposed back in 2020, and saw some minor refactorings in 2021, but no-one had reviewed it yet in this Commitfest. So I took the opportunity to review it.
First, to understand what the patch aims to do, let's look at a common EXPLAIN output for a Nested Loop:
QUERY PLAN --------------------------------------------------------------------------------- Nested Loop (actual rows=23 loops=1) Output: tbl1.col1, tprt.col1 -> Seq Scan on public.tbl1 (actual rows=5 loops=1) Output: tbl1.col1 -> Append (actual rows=5 loops=5) -> Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5) Output: tprt_1.col1 Index Cond: (tprt_1.col1 < tbl1.col1) -> Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4) Output: tprt_2.col1 Index Cond: (tprt_2.col1 < tbl1.col1) -> Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2) Output: tprt_3.col1 Index Cond: (tprt_3.col1 < tbl1.col1) ...
Based on this we might assume that each loop produces 5 rows, as the existing "actual rows" statistic shows the average across all loops.
But this example shows well where the math already doesn't add up: The parent Append node returns 5 rows on average, but the child node "actual rows" add up to 6. And the top Nested Loop node returns 23 rows, but we can't see clearly which index these rows are being found in.
With the patch in place, we get an extra row with
QUERY PLAN --------------------------------------------------------------------------------- Nested Loop (actual rows=23 loops=1) Output: tbl1.col1, tprt.col1 -> Seq Scan on public.tbl1 (actual rows=5 loops=1) Output: tbl1.col1 -> Append (actual rows=5 loops=5) Loop Min Rows: 2 Max Rows: 6 Total Rows: 23 -> Index Scan using tprt1_idx on public.tprt_1 (actual rows=2 loops=5) Loop Min Rows: 2 Max Rows: 2 Total Rows: 10 Output: tprt_1.col1 Index Cond: (tprt_1.col1 < tbl1.col1) -> Index Scan using tprt2_idx on public.tprt_2 (actual rows=3 loops=4) Loop Min Rows: 2 Max Rows: 3 Total Rows: 11 Output: tprt_2.col1 Index Cond: (tprt_2.col1 < tbl1.col1) -> Index Scan using tprt3_idx on public.tprt_3 (actual rows=1 loops=2) Loop Min Rows: 1 Max Rows: 1 Total Rows: 2 Output: tprt_3.col1 Index Cond: (tprt_3.col1 < tbl1.col1) ...
You can see how much clearer the picture is with this. We can understand that both
tprt2_idx contributed about equally to the result. We can also see that some loop iterations have smaller row counts (2), vs other iterations have higher counts (6). When
TIMING is turned on, you also get information on the min/max time of the loop iterations.
Given the prevalance of slow query plans that contain a Nested Loop, this appears to be a very useful patch. The main open item with this patch appears to be the slight overhead caused by collecting additional statistics - something to be discussed further on the mailinglist.
Interested in other EXPLAIN improvements? Here's what happened recently:
- Postgres 14: pg_stat_statements queryid is now built into core, and shows in EXPLAIN output
- In Development: Showing applied extended statistics in EXPLAIN (to quote my colleague Maciek: "Oh neat, that's pretty cool.")
- In Development: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Extended Statistics: Help the Postgres planner do its job better
Going back to what you can use today: Extended Statistics on Expressions, released in Postgres 14.
Let's back up there for a moment. If you are not familiar, extended statistics allow you to collect additional statistics about table contents, so the Postgres planner can provide better query plans.
The general syntax is like this:
CREATE STATISTICS [ IF NOT EXISTS ] statistics_name [ ( statistics_kind [, ... ] ) ] ON column_name, column_name [, ...] FROM table_name
Before Postgres 14 you could already create extended statistics that help the planner understand the correlation between two columns, which often times is necessary to avoid selectivity mis-estimates.
With the new extended statistics for expressions, you can inform the planner how selective a particular expression is, which in turn leads to better query plans. Here is an example of how to use this:
CREATE TABLE tbl ( a timestamptz ); CREATE STATISTICS st ON date_trunc('month', a) FROM tbl;
This will cause Postgres to not only collect statistics about
a itself (which it does by default), but also the expression that uses the
date_trunc function, and what the statistics of results of that expression are. You can find a complete example in the Postgres docs.
In addition to this, there are many changes in-flight that are being discussed:
- In Development: Improve selectivity estimates when extended statistics are present
- In Development: Extended statistics for Var op Var clauses / Expr op Expr
- In Development: Estimating JOINs using extended statistics
Crustaceous Postgres: Using Rust For Extensions & more
A side topic that isn't actually about Postgres development itself, but still pretty exciting on a larger scale: Postgres and Rust. As you probably know, Postgres itself is written in C, and that is unlikely to change.
However there are two great examples of Rust being used to augment the Postgres ecosystem.
First, you can write Postgres extensions in Rust using pgx, and by now this approach has matured to the point that even established extension authors such as the TimescaleDB team have started adopting Rust for some of their projects, such as the TimescaleDB toolkit.
Second, there are new systems being developed that build on Postgres, that utilize Rust as their language of choice, e.g. for networked services. The most interesting development in 2021 in this regard is the work of the team at ZenithDB, that is working on an Apache 2.0-licensed variant of a shared disk-type scale-out architecture (similar to Amazon Aurora), built on Postgres, with services written in Rust.
Other highlights from Postgres development in 2021
A single post could never do everything justice, but here are a few more things that caught my attention:
- Better Security: No more (unexpected) creation of objects in the public schema
- What's in your JSONB, really? Find out by Collecting statistics about JSONB columns
- ANALYZE + Partitioning: Did you know partitioned parent tables may need manual ANALYZE? With Postgres 14 it's easier to keep track of it
- Filtering Logical Replication: Want to filter your data by columns, or by rows? Postgres 15 may allow you to do both!
And even more things that are pretty cool:
- Postgres 14: SCRAM-SHA-256 is now the default for password encryption
- Postgres 14: pg_read_all_data and pg_write_all_data roles
- In Development: Support for NSS as a libpq TLS backend
- In Development: Non-superuser subscription owners
- In Development: Support issuing SSL certificates for multiple IP addresses
- Postgres 14: JSON subscript operators
- In Development: Custom TOASTer for JSONB, and Pluggable Toast
- In Development: JSON_TABLE
- In Development: SQL/JSON
- In Development: jsonpath syntax extensions
- In Development: Merging statistics from partition children instead of re-sampling everything
- In Development: CREATE INDEX CONCURRENTLY on partitioned table
- In Development: Lazy JIT IR code generation to increase JIT speed with partitions
- In Development: AcquireExecutorLocks() and run-time pruning
- In Development: Automatic partition creation (sadly this patch has no recent progress)
- Logical Replication
- Postgres 14: Multiple improvements & performance fixes
- Postgres 15: Allow publishing all tables in a schema
- Postgres 15: pg_stat_subscription_workers
- In Development: Logical Decoding on standbys
- In Development: Synchronize Logical Replication slots to standbys
- In Development: Logical decoding and replication of sequences
- & more :)
You may also enjoy looking at the current Commitfest, to make up your own mind.
The above might feel quite extensive, but that's not merely all of the things that have happened with Postgres in 2021. I'm excited to be part of such a vibrant community contributing to making Postgres continuously better and am eager to see what's to come for Postgres in 2022!
At pganalyze we're committed to providing the best Postgres monitoring and observability to help you uncover deep insights about Postgres performance. Whether your Postgres runs in the cloud, your on-premises data center, or a Raspberry Pi: You can give pganalyze a try.