Unlogged tables in Postgres: Is their data truly gone after a crash?
In today’s E56 of 5mins of Postgres we’re talking about unlogged tables, reasons to use them for increased write performance, reduced vacuum impact and less total WAL, and reasons to not use them, because data is lost when Postgres crashes.
- What is an unlogged table in Postgres?
- Benefits of unlogged tables
- Disadvantages of unlogged tables
- Benchmarking unlogged tables
- Clean shutdowns vs truncation after crashes
- Is the data truly gone, or can we recover it from disk?
- Are unlogged tables kept in memory only?
- What we have discussed in this episode of 5mins of Postgres
Let's dive in!
First of all, we'll start with this blog post by Greg Sabino Mullane on the Crunchy Data blog. Greg describes how, when he is working with customers at Crunchy Data, sometimes customers add unlogged tables, and he's going to describe the different trade-offs.
I personally don't use unlogged tables much, but I do think they can be quite useful to have in your arsenal of tools when working with Postgres.
First of all, the choice you can make is on a per table basis, whether a particular table can be unlogged or not. The default is that the tables are logged. Now, logged means that they're written to the Postgres WAL and they're essentially persisted that way. So, in short, it's a durability choice you're making here.
An unlogged table will give you massive improvements on write performance. It will also reduce the vacuum impact because vacuum writes changes to the Postgres WAL stream. Doesn't have to do that for an unlogged table. And they generate less total WAL which leads to less WAL traffic, fast replication for other tables, smaller backups and such. So there's reasons to use them.
Now, the reasons to not use them, is that unlogged tables are truncated on crash recovery. So that means if your Postgres server crashes, you lose the data in an unlogged table. It's gone.
Now you can also only access this data on the primary, not on replicas and that's because an unlogged table isn't in the WAL stream. And so the data wouldn't show up on the replica.
The same applies to logical replication because logical replication underneath the hood looks at the WAL files to get the data for replication. Unlogged tables are just simply not present there. So there is a trade off here.
The first thing that Greg does here to look at how much of a benefit they are is he runs a "pgbench". He also turns off autovacuum when running this pgbench. And don't turn off autovacuum on your production system. That's a bad idea, but on a pgbench, it can make sense because it reduces variability because pgbench doesn't have an autovacuum that interrupts it's workload.
Here we're doing a table with 1 million rows and then we're updating those 1 million rows. This is with a logged table by default. And that takes 5.6 seconds to run. That's a very short benchmark, but I think it shows the point, which is here in this case, we are generating 320 megabytes of WAL.
When we are re-running this same test. We can use the "--unlogged" option to pgbench, which makes the pgbench tables unlogged. You can see that now it takes 3.1 seconds to run. It's almost twice as fast to run the benchmark. And more importantly, you can see that the WAL that was generated is a mere 16 megabytes. A single WAL segment, instead of 320 megabytes we saw earlier.
Greg also looks at the pgbench numbers here. We can see with the transactions per second, if we run the regular logged table, it's 168 transactions per second. Versus an unlogged table is 485 transactions per second. Much faster to run the "pgbench" with an unlogged table.
Now, why wouldn't we always want to do that? Really the main thing to know about unlogged tables is that they're not persistent. Essentially you should expect your data to be gone. The one exception is when you have a clean shutdown. When you have data in your unlogged table and you shutdown Postgres with a restart, for example, then the data will still be there.
However, when there is a crash, Postgres will truncate it. And that's in a sense a feature. The reason that it truncates is because it's not clear what the data is. And so Postgres goes and zips it at the start.
Here we have the same pgbench running. And then we kill one of the Postgres processes. Don't do this in production. Essentially what you can see here is that after the Postgres server starts again, then the data is gone. So our unlogged tables are reset.
As Greg notes here at the end, if your data is temporary or easily replaced, then unlogged tables can be a good fit. But you should treat them as living in a Schrödinger's box. At any moment your table might have all the rows or none of the rows.
Now, there was a Hacker News discussion about this. And there was an interesting point where somebody said, technically, it's not the crash that truncates the data, it's the crash recovery that truncates it. And so if you go in, before you start your Postgres again, you can recover your data.
Andres Freund from the Postgres committer team responded here that unlogged table data oftentimes isn't even written to disk. Now what is on disk after a crash is completely inconsistent, so it could be missing data, it could be old row versions. You just don't know. And so trying to do anything useful with that is not really a good exercise to do because you will be missing a lot of it and really you should not have used an unlogged table to begin with if you can't regenerate your data.
There was a clarification question here, which is, somebody thought unlogged tables mean they don't get WAL logged and they also don't get stored in the file system. And that's not correct. Unlogged tables are stored in the file system because unlogged tables can be many times the size of memory, and so Postgres needs to reserve space for them.
Unlogged tables can be written to disk because there is pressure in shared buffers - there's not enough space - and so Postgres has to write out something to disk, to make space for something else. Another case can be if there is a clean shutdown checkpoint as seen in Greg's post. This is why you may see files on disk - but unfortunately you just don't know what state that on-disk file is in. And so you shouldn't trust a crashed unlogged table on disk at all.