Speed up Postgres with transaction pooling with prepared statements
In E73 of “5mins of Postgres” we're talking about how to use transaction pooling with prepared statements to make Postgres up to 30% faster. We're looking at improvements to pgbouncer, PgCat, and other connection poolers.
Let's have a look!
We'll start with this blog post by Lev from the PostgresML team, which outlines the bold claim that they're going to make Postgres 30% faster in production.
Now, I would say let's add a few more details to that 30% number, but I think there are some interesting aspects here. Lev and team maintain a new connection pooler called PgCat, which is written in Rust, and the idea is that it's more scalable than existing poolers such as pgbouncer and pgpool.
Let's dive into some of the technical specifics before we jump into the benchmarks later in this post.
Let's go into the GitHub PR that added a specific feature to PgCat. It is adding support for prepared statements, both in session and transaction mode. If you're familiar with poolers, most of the time when your goal is to make efficient use of connections on the database side to reduce your idle connection count, you're going to use transaction mode.
Transaction mode means that when your client opens a connection to the pooler, that same connection does not map one-to-one, to a connection from the pool to the server side. Instead, the connection pooler is going to multiplex the connections to the server in order to save on the idle connections on the Postgres server. What's unique here is that we're able to use prepared statements in a place where usually we haven't been able to use them! Prepared statements have a lot of benefits in terms of saving planning time for your queries.
Usually, when you send the exact same query to Postgres, Postgres will run a new execution plan. That can be quite expensive. But, when prepared statements are in use, Postgres is able to cache the query plan in many cases and is able to reuse it for future queries. That way, you can save on CPU time and sometimes also get more predictable query performance.
The work by Lev and team is based on a patch for pgbouncer. Before we dive into it, let's just briefly take their description of what this is doing, because I think it applies consistently to the approach taken in PgCat, as well as the patch for pgbouncer.
To understand how this works: When the prepared statement is already prepared, nothing happens. If it's not yet prepared, we prepare it on the server by sending a bind and execute. The notable difference to normal behavior here is that this actually will rename prepared statements to unique names and keep a mapping between the client side name and the server side name.
The problem here is that, for example, when you have an application like Rails, it will assign prepared statement names like "a1", "a2", "a3". It does that for each client. Usually, they have independent connections and prepared statements are per connection.
But if they're not independent, if they're actually multiplexed thanks to a connection pooler, then you'll get conflicts between two different connections on the client side. What PgCat does, and what is necessary to make this work, is that it actually remaps from the client side name, like the "a
They are promising a big performance improvement. Let's take a look at the pgbouncer patch. This patch was contributed by Martijn Dashorst about a year ago. I think this is a very appealing patch, and it's not the first version of that patch either. What they've done is they have implemented this in production.
They had a migration from Oracle to Postgres and they saw a big problem with the high CPU usage, so they wanted to do something to improve the situation. The linked patch shows a lot of graphs, but I think that even the top graph shows you a very nice story.
You can see on the left side: this is a usual weekday traffic for them. Then on the weekend they have a low period. Then you can see a Monday where they hadn’t rolled out the patch yet, and then they rolled out the pgbouncer patch with these prepared statements in transaction pooling mode. You can see a noticeable difference between the CPU utilization before and after.
Whether that was actually 30% in this case, I'm not sure, but I think it clearly shows that it was a real life improvement on a production workload.
The other thing to notice about this patch is that this has been open for about a year. So the question is: is pgbouncer going to add support for this, or should we be changing to something like PgCat? The good news is, more recently, I saw this post by Jelte Fennema on the pgsql-hackers mailing list, where he left a side note saying that they’re actively working on implementing named prepared statement support for pgbouncer in transaction pooling mode. Jelte is part of the pgbouncer team, which he recently joined. If you're not familiar with the pgbouncer development history, pgbouncer has existed for quite a long time, but for many years, up until I think about two years ago, there wasn't much activity. Peter Eisentraut took over and did a bunch of maintenance releases to pgbouncer, but there still hasn't been a lot of feature development.
What's really exciting is, is that the Jelte and the team at Citus are actually starting to work more on pgbouncer itself. It looks like we can expect a lot more active development around pgbouncer than previously, which I think is really exciting and we should see a lot more good new functionality coming to pgbouncer.
Now, there is a new patch that they've worked on for about a month. This is still work in progress, but I have hopes that pgbouncer will also get support for prepared statements in transaction pooling mode.
Coming back to the initial blog post by Lev for PgCat: they actually ran a more simple benchmark using pgbench where you can see that they essentially looked at a scenario around “when there's a high number of clients: what's the number that we can get?”
Even in the case of pgbench, there is a clear improvement. Really, this comes down to being able to use prepared statements, which, because of these savings in planning time, will get you a big performance improvement. It may even be up to 30%.