5mins of Postgres E3: Postgres performance cliffs with large JSONB values and TOAST
Today, we're going to talk about JSONB and how JSONB performance can behave badly when the JSONB value exceeds the two kilobyte limit where Postgres starts storing data in TOAST.
Share this episode: Click here to post to twitter, or sign up for our newsletter and check out the newsletter archive and subscribe to our YouTube channel.
Transcript
Let's dive right in!
How Postgres stores data: Large JSON value query performance
Now, for those of you who don't know, JSONB is a binary version of the general JSON datatype in Postgres that was added in Postgres Version 9.4. This blog post by Evan Jones is an analysis of how JSONB storage behaves. They're comparing this against hstore, which is an older way of storing key value pairs in Postgres, as well as regular JSON, which is essentially a text datatype.
In the post, they go into details of how Postgres actually stores large values. The important thing to note is that Postgres stores data in eight kilobyte pages. Whilst you can customize this page size, that does require a recompile of Postgres. So in practice, most people that use Postgres, even if it's on their own VM, operate on an eight kilobyte page size. Each page can contain a minimum of four tuples. A tuple would be essentially a row. That means because Postgres tries to optimize minimum number of tuples per page that each tuple has a maximum size of two kilobytes.
The first thing that Postgres does when something exceeds the two kilobytes is to try to compress the data. The compression tries to fit it inside that page. If that doesn't work, Postgres has to figure out where it stores that data. It can't keep it local within the page. Instead of just adding pages into the main table itself, which would cause other problems, Postgres uses a separate mechanism, called TOAST or "The oversized attribute storage technique".
TOAST is a generic table of sorts that stores chunks of the actual values in the main table and stores those out of band in a separate mechanism. As noted here, this does cause performance of queries to get substantially worse. In their test here, they saw that once you hit the TOAST storage, either TOAST uncompressed or TOAST compressed, JSONB does perform pretty bad. The thing to know is if you have large values that exceed that two kilobyte limit, then JSONB performance can be bad.
PostgreSQL physical storage
If you want to read more about how Postgres' physical structure works there's a good blog post that was written a few years ago that tries to summarize how Postgres does this. Halfway through the article, you can find a very helpful diagram. Postgres has an eight kilobyte place to put things. The reason that TOAST is a thing is because things just don't fit in those eight kilobytes. That's kind of why we have this problem.
Now last but not least: there are people that are looking at this and they're trying to optimize this type of performance problems.
Understanding JSONB performance
There was a really good talk late last year at PGConf New York by Oleg and Nikita from Postgres Pro. I would encourage you to look at this presentation if you have too much time to spare. This is one of the best summaries of how JSONB works and how JSONB performance works that I've seen over the years.
Oleg and Nikita have worked on JSONB itself. There is a lot of in-depth understanding in this talk. I'll skip over most of it here, but suffice to say that they agree that JSONB is important. Sometimes people don't see the point, but they really think that JSONB is something that should behave well. The one slide that I wanted to highlight is where they show the way that TOAST storage works: if you update something, if you update a value in a row that exceeds that two kilobyte limit and is stored in TOAST, TOAST was not designed for updating values and particularly it wasn't designed for updating a small part of a large value. It was designed for atomic data type, it knows nothing about the internal structure of these types. When you update a value that's in TOAST, it always duplicates the whole value. That means even if you have a megabyte JSONB document and you're just updating a single part of that, with the correct syntax and whatnot in Postgres, Postgres for these large values has no way to avoid making a full copy of that TOAST value. And that's really a problem. It's a problem because of the extra storage the backend has to go clean up. And it's a problem because of the WAL traffic. Because Postgres has to persist that. That's really something that is hard to spot.
Oleg goes into a lot more details in his talk. The nice thing is that they are working on improvements. There are multiple patches in Postgres 15. These unfortunately require core changes, it's not something that an extension could provide because TOAST is rather closely bundled to the core Postgres functionality. They are proposing to make some of that more extensible so that new extensions could be developed that for example optimize JSONB storage in TOAST.
Thanks so much for joining us today. Subscribe to our YouTube channel, sign up for our newsletter and follow us on Twitter to get updates about new episodes!
What we have discussed in this episode of 5mins of Postgres
Postgres large JSON value query performance
Introduction to PostgreSQL physical storage
Conference Schedule - PGConf NYC 2021 - Understanding JSONB Performance
Understanding JSONB Performance by Oleg Bartunov and Nikita Glukhov