Postgres connection pooling: Comparing PgCat and pgbouncer
In episode 60 of “5mins of Postgres” we’re talking about PgCat, a new connection pooler for Postgres, and how it compares to pgbouncer, which a lot of folks currently use.
Let's get to it!
First, I encourage you to read through the original Hacker News post by Lev from the PostgresML team where he announced PgCat a little bit over a year ago, and he announced it as a way of taking pgbouncer to the next level.
This is an early project - it was literally started a year ago - but it does show some early promises. Essentially, it is a Rust-based connection pooler that tries to be compatible with pgbouncer, including some of the management APIs. And it extends it beyond pgbouncer's capabilities, for example it features load balancing as well as some sharding options.
It extends it beyond pgbouncer's capabilities, for example it features load balancing as well as some sharding options.
PgCat itself does describe itself as beta. I would be careful using it in production, and I would expect certain things to not work or error out. Now, what is very promising and has me personally excited is that the team at Instacart actually has started adopting PgCat!
This is a post by Mostafa, Zain and Andrew from the Instacart team where they're describing how they adopted PgCat.
They use Postgres for a lot of things at Instacart, and they are squeezing out a lot of performance out of individual Postgres instances. But oftentimes the only way to scale is to do things like having additional read replicas or also doing sharding on some of the databases. In their article, they mainly talk about load balancing for read replicas and other benefits of using PgCat.
In the case of read replicas, you do have to have some awareness of how many replicas there are. Let's say you have five read replicas, your application actually needs to know how to make use of them and needs to load balance read query traffic between them.
Instacart has actually solved this at the application side historically. They have written a library called Makara, which is a Ruby library that handles this inside Rails, inside ActiveRecord, and does load balancing and failover. Part of the motivation here is to have a cross-language solution to this load balancing and failover problem.
The team at Instacard has done a comparison between different options. The four options they've evaluated were
- using pure pgbouncer
- using pgbouncer with a side car in a container to handle the failover situation
- using Yandex's Odyssey project
- using PgCat.
The big benefit of both PgCat and Odyssey is that they're multi-threaded. If you use pgbouncer at scale, you very likely have run into the problem that pgbouncer is ultimately a single-threaded asynchronous connection based pooler.
It is pretty good at handling a lot of connections, but if you have a lot of activities for individual connections, that's where you'll run into bottlenecks. Both Odyssey and PgCat don't have this issue.
All of them support the pgbouncer control commands, like reloading, showing pool information and such. This is really useful, you can use the pgbouncer control language as a universal language across all these.
The main benefit of PgCat is that it handles replica failovers and it also has improvements around poorly behaved clients.
The big benefit of both PgCat and Odyssey is that they're multi-threaded.
And last but not least, it is written in Rust compared to all other options here, which are written in C or C++.
Rust's big benefit is that it has increased memory safety. It's much easier to trust a new service that's written in Rust. "Will it corrupt memory", "Will it have security bugs?", there's less likelihood of those. I think that's part of why they felt it was worth pursuing this direction.
The first thing they ran was a comparison in terms of latency. pgbouncer is pretty fast because pgbouncer is written in C – it's not that complicated. Really, what they were measuring is whether PgCat has overhead. Is it slower than pgbouncer? The good news is that in their production test, they've seen that this was not a problem. It's slightly slower, but that's in the order of microseconds in the most part.
Overall, this gives me confidence that PgCat is ready to serve production traffic from a query latency perspective.
They've also done additional tests on correctness. That's where they've contributed some fixes, they, I think, are the first big production use case for PgCat, and so obviously a lot of things were not in place yet, but are now in place. And so the good news is that they're now at a point where PgCat can handle their production workloads without issues.
Overall, I think Instacart is pretty positive with this. They've clearly run this with a lot of thought and have done a lot of testing. If I were considering whether to try out PgCat, this would give me confidence to try it out.
I do think it's still a bit of a moving target, but PgCat is targeting a 1.0 release in the next coming weeks. Once the 1.0 is out, combined with Instacart's production testing, this gives a very good starting point for this to become a new mainstream connection pooler that a lot of people in Postgres will use.