5mins of Postgres E41: Tuning shared_buffers for OLTP and data warehouse workloads & is 25% of RAM a good choice?
In this episode of 5mins of Postgres, we’re going to talk about tuning shared_buffers, benchmarking with pgbench, TPROC-C and TPROC-H, and whether using 25% of RAM is a good choice.
Let's jump right in!
I want to start with this tweet by Andres Freund from a couple of years ago. In this tweet, Andres pointed out that the often repeated recommendation to not exceed 25% of RAM for
shared_buffers in Postgres is wrong.
The test he ran back then was using a pgbench with scale 1500, which is roughly a 22 GB database. He was doing this on a local machine with a fast SSD drive and with 32 GB of RAM. He was testing with
- 1 GB of
- 8 GB, aka 25%, and
- 24 GB, which notably was slightly larger than the amount of data that he was working with.
Back then, what he was showing was that picking 25% of RAM would actually give you the worst performance out of the three. He showed that if you picked 24 gigabytes, that would actually get you the best performance! The main point here was: if your working set fits into memory, then choosing a
shared_buffers that encompasses your working set gets you the best performance.
If your working set fits into memory, then choosing a
shared_buffersthat encompasses your working set gets you the best performance.
More recently, Shaun from the EDB team was running a comprehensive set of tests to figure out how to set
shared_buffers effectively. Earlier this month, he started by running tests with an HP DL360 server with local hard drives. This is also a system that has 64 GB of RAM. First, he was doing a pgbench with a database size of 110 GB, so about double the size of RAM. He was testing different values of
shared_buffers up to 32 GB, so 50%.
pgbench test, performance did get better when there was a higher
shared_buffers setting, for example 50% of RAM.
Shaun was also looking at
pg_buffercache, which shows you how Postgres re-uses its pages or has to go and fetch new pages. The usage count here is essentially a representation of whether something was reused or something had to be fetched from disk or the operating system cache.
A higher usage count is better, with the maximum being five. So usagecount = 5 means that a good portion of the workload was in
shared_buffers, and so it could be reused, whereas usage_count 0 or 1 means it's less likely to be the case. In this
pgbench test, some portion of the workload was reusable, but not a lot actually. What you could see was that this represents a large portion of the working set exceeding
shared_buffers. That’s why we have a lot of turnaround in shared buffers.
He was also running a TPROC-C benchmark. Instead of
pgbench, which kind of “goes all over the place”, TPROC-C has a smaller, active working set. Here, a much larger shared buffers did give tremendous performance improvements. The reason for that is that a lot of the data was in shared buffers and could be reused.
He was also testing the TPROC-H benchmark. In this particular case a lower shared buffers performed better and resulted in better performance. The reason for that is that shared buffers wasn't really effective at caching the right pages, so Postgres was essentially taking away space from the operating system cache that could have instead cached things more effectively.
Now, earlier this week, Shaun re-ran these tests with 128 GB of RAM, double the RAM from last time and enterprise class SSDs.
What's interesting with the
pgbench here is that the
pgbench performed exactly the same, no matter the
shared_buffers setting. This
pgbench had a working set of 219 GB, which is to say it was clearly beyond the RAM. But, as the cost of fetching a page was so cheap, because these enterprise class SSDs are really fast, it didn't really make a big difference how to tune shared_buffers here, because you didn't actually get that much benefit anyway versus getting it from disk.
He also ran a TPROC-C benchmark again we we can see that a higher
shared_buffers did benefit. Even though these SSDs are pretty fast, there is still a cost to going to the Linux page cache, and then actually going to disk. It means you’re spending more CPU cycles, versus if something is in the buffer cache – that's still going to be faster. We can see high usage counts, so this type of workload did perform better with a higher
Last but not least, he also re-ran the TPROC-H benchmarks and there is still a benefit here to a lower
shared_buffers for a data warehousing type workload.
Shaun makes two conclusions here.
- He says that for a warehouse type workload, you should consider using the default of 128 MB of
- You should continue to follow the 25% rule for OLTP systems.
I would disagree slightly here. I think that Shaun's data shows that that's not necessarily the best choice. I would also argue that the trade-off here is, if you have a certain portion of your working set that can be reused, higher
shared_buffers is going to be better. However, a larger
shared_buffers makes it more expensive to get new pages, because the clock-sweep algorithm in Postgres has to look at more data.