Connect Claude Code, Codex & Cursor to your Postgres statistics: pganalyze MCP Server is now in Public Preview

Postgres in Production, Special Series: Deep Dive into pg_stat_statements (Part 3)

In Part 3 of this special "Postgres in Production" deep dive series, Ryan Booz walks through where pg_stat_statements actually stores your query text. Spoiler: it's not in the in-memory hash table you might expect. He covers the file's location on disk, how it grows, the pointer mechanism that ties hash table entries back to your SQL, the quirky consequence that the same query text can appear multiple times, and why ORM-heavy workloads can balloon this file to hundreds of megabytes.



Share this episode: Click here to share this episode on LinkedIn. Feel free to sign up for our newsletter and subscribe to our YouTube channel.


Transcript

A quick recap

In the first episode, we looked at the in-memory hash table that backs pg_stat_statements and the metrics it tracks. In Part 2, we dug into what actually makes an entry "unique" — the queryid, which is derived from the structure of the query rather than its text, combined with the role, the database, and whether the query was top-level.

That recap matters because of a question I haven't answered yet: if the hash table is keyed on a structural hash, where does the actual query text live?

Where the query text actually lives

This is the part that surprises most people the first time they hear it. The query text is not stored in the in-memory hash table. It lives in a separate file on disk, inside the Postgres data directory.

The hash table only holds the metrics, the queryid, and a few small bookkeeping fields. When you ask for the query text, Postgres has to go read it from this file.

Locating the query text file

In most installations, the file lives under the global directory inside the Postgres data directory:

$PGDATA/global/pgss_query_texts.stat

In some Docker images and packaged installations, you'll find it in $PGDATA/pg_stat_tmp/ instead. Either way, the file is there somewhere, and it's worth poking around once just to see it exists.

How the file grows: append-only with a 2× size ceiling

Two things to know about how this file behaves:

First, it's append-only for new entries. Every time a brand-new, normalized query shows up in pg_stat_statements, the text of that query gets written to the end of the file.

Second, it has a size ceiling. The file is allowed to grow to roughly twice the mean length of all currently tracked queries, multiplied by the maximum number of entries pg_stat_statements is configured to hold. pg_stat_statements updates the mean query length during deallocation, so it changes over time, before the query file is truncated.

The pointer mechanism: offset and length per entry

The way the hash table and the file stay in sync is straightforward. For every entry in the in-memory hash table, pg_stat_statements stores a pointer into the text file — really, an offset and a length. When you ask for the query text, Postgres opens the file, seeks to the offset, reads that many bytes, and hands you back the SQL.

The file is otherwise hidden from you. You'll never interact with it directly. But understanding that the pointer dance is happening explains a few quirks you'll see in production.

A performance note: when the file actually gets read

Because reading the file is a real I/O operation, pg_stat_statements is careful about when it opens it.

If you call the function form, pg_stat_statements(true), you'll get the query text in the results. Pass false instead and pg_stat_statements skips the file entirely:

-- Reads the on-disk text file:
SELECT * FROM pg_stat_statements(true);

-- Does not read the file:
SELECT * FROM pg_stat_statements(false);

Unfortunately the same does not apply to the view. Regardless of whether or not you select the query column, the view always opens the query text file and retrieves the query text. On a busy system where you're just charting metrics, using pg_stat_statements(false) is a good habit to get into, only retrieving query texts when necessary.

No memory of previously tracked queries

Here's the quirky consequence of the pointer mechanism. pg_stat_statements has no memory of previously tracked queryids.

Say your pg_stat_statements.max is 5,000 and you fill it. A new query comes in, an old one gets evicted, and the evicted entry's metrics are gone. Now imagine that same evicted query runs again later. As far as the in-memory hash table is concerned, this is a brand-new entry. Postgres writes the text to the file again, at a fresh offset, with a fresh pointer.

The practical result: the same query text can appear multiple times in the file. It's not a bug, it's a direct consequence of the hash table not remembering what it has evicted.

The rewrite cycle when the file fills up

When the file hits its size ceiling, pg_stat_statements doesn't garbage-collect in place. Instead, it rewrites the entire file based on what's currently active in the hash table, writes that to a temporary file, drops the old file, and renames the temp file into its place.

There's a brief moment of locking during the swap. On a small file that's invisible. On a large file, it isn't.

A real-world size example

This file can grow more than you'd expect. As a quick illustration, I checked a local instance with pg_stat_statements.max = 5000:

  • About 5,000 currently tracked queries
  • Mean query text length: about 900 bytes

That works out to roughly 5000 × 900 × 2 ≈ 9 MB before pg_stat_statements rewrites the file. That's modest.

But the multipliers run away on you fast. Raise max to 20,000 or 50,000, or work with a system that generates very long SQL on average, and the file can comfortably reach hundreds of megabytes. At that scale, the rewrite cycle is a real disk and lock event, not a footnote.

Why ORMs make this worse

ORMs are the most common reason a real-world pgss_query_texts.stat blows up.

ORMs don't optimize for the structural shape of the SQL they emit. They optimize for translating your application code into something Postgres can parse. The result is verbose, comment-laden, sometimes deeply nested SQL — and the texts are long. When the mean query length on your system is 8,000 bytes instead of 900, the size ceiling for the text file shifts by an order of magnitude.

Combined with the high cardinality problem from Part 2 (where ORMs create lots of distinct queryids), this is the worst-case profile: many entries, each one large, file at its maximum size, frequent rewrites under lock.

If you're seeing intermittent latency spikes on a system with a heavy ORM workload and a large pg_stat_statements.max, this is worth a look.

Key takeaways

  • The query text isn't in the hash table. It's in a separate file on disk in the Postgres data directory.
  • Each hash table entry holds a pointer (offset + length) into that file, not the text itself.
  • The file is append-only for new entries, and grows to roughly 2× the mean query length × pg_stat_statements.max.
  • pg_stat_statements has no memory of previously evicted queryids. Evicted queries that come back are treated as new, and their text gets written to the file again. The same text can appear multiple times.
  • When the file hits its size ceiling, Postgres rewrites it from scratch. A temp file gets written, the old file is dropped, the temp is renamed in. There's a brief moment of locking during the swap.
  • ORM-heavy workloads are the most common reason the file balloons. Long SQL × many distinct queryids = a large file and frequent rewrites.
  • Reading the file is optional. pg_stat_statements(false) skips reading the file from disk.

What's coming next

Now that we've covered where the query text lives, the next episode takes us to the Postgres source. We'll look at how pg_stat_statements.max is enforced, how the 2× file size ceiling is implemented, what eviction actually does, and what gets kept when entries are evicted. That's where the practical "is the data I need actually still in pg_stat_statements?" question gets answered.

I hope this series helps you better understand one of the most essential tools in the Postgres ecosystem. Feel free to subscribe to our YouTube channel, sign up for our newsletter or follow us on LinkedIn to get updates about new episodes!

What we discussed in this episode


Enjoy blog posts like this?

Get them once a month to your inbox