5mins of Postgres E21: Server-side backup compression with LZ4 and Zstandard in Postgres 15, and the removal of exclusive backup mode
In today’s episode 21, we're going to talk about backup improvements in Postgres 15. Postgres 15 beta1 came out a couple of weeks ago and we’re looking into LZ4 and Zstandard compression, as well as the removal of the exclusive backup mode.
Let's have a look.
Postgres 15 beta1 came out a couple of weeks ago. When we take a look at the release notes, the third item it lists currently in the release notes talks about the long deprecated exclusive backup mode that was removed. Exclusive backups have generally been a bad idea, because when you don't clean up the exclusive backup file, then the server could actually fail to start, and of course you don't want your backups to break your whole production server.
Exclusive backups have generally been a bad idea.
This backup mode has been removed, but to make sure that people upgrade to new functions and don't have hidden errors that they run into, the previous pg_start_backup and
pg_stop_backup functions have now been renamed to pg_backup_start and
pg_backup_stop. If we take a look at the function definitions here, this is in Postgres 14 and you can see that the
pg_start_backup function in Postgres 14 had three parameters:
Exclusive is the mode that was removed. In Postgres 15, instead of a
pg_start_backup function the function is now called
pg_backup_start, and only has two parameters,
fast, because it always requires a non-exclusive mode.
This is important to know because you might be using a script or a third party program that relies on calling these functions to implement your backups, and they may not yet be updated with Postgres 15. If you are testing the beta release, this is an important gotcha.
The other thing I want to talk about is server-side backups. This is a feature that Robert Haas and a few other folks worked on. The goal of this feature is to improve how fast backups are, and the size of the backups. We're always talking about running a
pg_basebackup, this doesn't talk about
pg_restore, this talks about a full base backup, like a binary backup of the server. In this test, that he ran in February, he was comparing a pgbench database and it was running a
pg_basebackup of the pgbench database over the network. This was over VPN and it was also with an SSH tunnel. I would say maybe slightly unrealistic for a production situation, but it's good to show the power of compression.
The base backup with no compression took 16 minutes to complete, versus the base backup with server-side LZ4 compression (the server compresses before it sends things over the network) got a much faster, 1 minute 50 seconds backup time.
Clearly, that's much better just in terms of time it takes to run the backup, and that of course also reflects in the size. With the LZ4 compression, instead of the 1.5 gigabytes for the base backup here, the base backup only takes 170 megabytes, almost a 10x improvement over the original base backup file.
This looks pretty cool, and then about a month ago, Robert posted a follow-up blog post, where he does a more thorough comparison of the different backup compression modes in Postgres 15.
For this test, he used the UK land registry data set, and that data set, after it was loaded, had about 3.8 gigabytes of data. Still pretty small, but I would say more representative in terms of the type of data that's contained.
He was also using a more representative setup between two machines on an internal network with a 10 gigabit uplink. This is what you would maybe expect on a best case production setup. Here, he was testing both the size of the backup, and then also the time, with both, a tar archive base backup as well as the plain format base backup. You can see in this comparison that "no compression" had about 3.8 gigabyte of data.
Now, when we compare this with gzip for example, you can see that gzip was a little bit under half of the data size, but the time was much worse. More than 10 times the time it took to run with no compression, gzip was really slow. Gzip historically was the only mode that was available, but it was only available on the client side. That means you're not saving anything on the transfer time of course, between the server and the client.
With the new LZ4 compression, you can see that the size is slightly larger than gzip, because it's slightly less optimal in many cases, but it is actually much faster, it takes a little bit less than double the "no compression" option.
Later in the Postgres 15 release cycle, they also added the Zstandard support, and Zstandard also has support for compressing things in parallel. Zstandard looks like a good default algorithm for compression. Zstandard compression gets you the smallest backup size.
Previously, it was 3.8GB on a plain text, 1.5GB for gzip, and now with Zstandard, 1.3GB compressed, so much smaller. Looking at the time, the time with the non parallelized version of Zstandard is definitely the worst, and also worse than LZ4, but then if you use the parallel version, that's actually going to be faster than without compression. Even on a very fast network it still helps to just reduce the amount of data that gets transferred over the network, you can clearly see that if you can afford the CPU time on the compression here, Zstandard looks like a great solution for many production applications.
If you can afford the CPU time on the compression here, Zstandard looks like a great solution for many production applications.
I would say this is one really exciting feature in Postgres 15. Thank you so much for listening. This was 5mins of Postgres. Subscribe to our YouTube channel to hear about next week's episodes, check out further of our episodes about Postgres 15 below, follow us on Twitter and talk to you next week.