Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

How Notion Runs PostgreSQL at Scale on Amazon RDS with pganalyze

We recently sat down with some members of the Notion engineering team to discuss how they manage PostgreSQL at scale on Amazon RDS with pganalyze. As Notion has grown, ensuring database performance and reliability has become increasingly important. In this case study, we explore the challenges they faced, the solutions they implemented, and the impact of using pganalyze to optimize their PostgreSQL operations. You can also find the full case study in our resources section here.

Notion blog image

Notion, the popular productivity tool, has undergone tremendous growth since its founding in 2015. By 2021, a viral TikTok trend led to exponential user growth, bringing with it significant challenges to scale its Postgres-based database infrastructure. To help address these growing pains, Notion adopted pganalyze to ensure optimal PostgreSQL database performance, as well as maintaining reliability and availability.

The challenge

From 2015 to 2020, Notion relied on a single large PostgreSQL database hosted on Amazon RDS, which served them well during the early stages of product development. However, as user adoption surged, this setup began to show cracks. Challenges included outages, inefficiencies in vacuum processes, and scaling limitations that necessitated major infrastructure overhauls.

"We couldn’t understand why our database was struggling so much with things like vacuums. It was just hard to get visibility into what is happening in the internals. None of the tools we had, like RDS Performance Insights or application-level logging, gave us the deep insights we needed."

— Arka Ganguli, Engineering Manager, Notion

In early 2021, Notion began its journey to address these issues. This included sharding its core database and seeking tools to provide deeper visibility into PostgreSQL internals. After a recommendation from PGExperts, a leading PostgreSQL consultancy, Notion adopted pganalyze in March 2021 as a critical solution for managing database performance.

The impact of pganalyze on Notion’s database operations

Notion product screenshot
With pganalyze, Notion engineers gain visibility into issues that were previously opaque, such as unused indexes contributing to table bloat and inefficient queries consuming excessive CPU and I/O.

"During incidents, pganalyze gives us the visibility to pinpoint what’s happening. It’s especially helpful for understanding spikes in CPU or query load and distinguishing between root causes and symptoms."

— Arka Ganguli, Engineering Manager, Notion

A particularly impactful improvement involved refining a GIN index on the space_id and permission columns, used for JSON filtering with the contains operator (@>). On Notion’s development database, the index was not being used due to poor cardinality and statistics. With the help of pganalyze, it was determined that switching to a jsonb_path_ops index would better serve the query patterns for larger workspaces.

Implementing this change resulted in a 733% performance improvement, reducing query runtime by more than 8x, from approximately 5000 ms to 600 ms. Comparisons in staging showed no mismatches, paving the way for a seamless deployment to production.

Another time, Ben Hughes, Software Engineer on the infrastructure team, recalled encountering a performance issue while a new feature was in pre-production testing. The database was becoming overloaded, causing pages in the app to load slowly or crash entirely. Some initial investigation revealed that this new feature was responsible for 20% of the application’s query load in its current state.

"pganalyze is part of our first line of defense. If we’re having a problem on the database, I know I can check pganalyze – whether it’s a vacuum process holding us up or if I have to do some more digging."

— Ben Hughes, Software Engineer, Notion

Using pganalyze, Ben was able to identify the problematic queries and find patterns in their syntax. He noticed that when a query had more than 23 clauses it would cause the Postgres planner to choose a full table scan of a 1TB table, rather than using the available indexes. By limiting the number of clauses to 20, Ben was able to batch the queries and reduce the feature’s query load to less than 5%, resolving the performance problem before it impacted production.

The ability to quickly analyze the query patterns and identify the root cause using pganalyze was instrumental in mitigating this and many other incidents.

Adoption across engineering teams

Initially adopted by the infrastructure team, pganalyze provided much-needed insights into database performance bottlenecks. Over time, its use expanded across engineering teams, empowering application developers to identify and resolve database issues independently. This broad adoption was facilitated by integrating pganalyze with Okta, allowing seamless access and fostering collaboration.

"Product engineers now use pganalyze to go through the entire process of analyzing queries and indexes themselves. It’s great to see them self-diagnose and solve problems that previously would have required back-and-forth with our team."

— Arka Ganguli, Engineering Manager, Notion

Key capabilities included diagnosing inefficient indexes, monitoring vacuum processes, and comparing query performance across shards during major migrations. For example, during the transition from 32 to 96 databases, pganalyze helped ensure consistency in query behavior, providing the confidence needed for a smooth migration.

Lessons learned and future plans

Throughout this journey, Notion learned valuable lessons about addressing capacity issues early, choosing appropriate sharding keys, and designing no-downtime migration strategies. Notion also discovered the importance of optimizing replication processes and carefully planning index management during data migrations.

Notion’s experience highlights the importance of investing in database visibility and proactive monitoring tools at the right time. By equipping engineers with pganalyze, the company reduced its reliance on infrastructure specialists while enabling faster incident resolution.

"My favorite thing about pganalyze is how intuitive it is. When I share a link with someone, they just get it. It’s not hard to understand what the tool is showing you, which is not something you can say about a lot of other tools."

— Arka Ganguli, Engineering Manager, Notion

Looking ahead, Notion plans to explore dynamic sharding to handle increasing workloads and large customer accounts. The team is also focused on staying current with their PostgreSQL versions to leverage new performance features.

Conclusion

Notion’s partnership with pganalyze has been pivotal in its ability to scale PostgreSQL effectively. By providing actionable insights and empowering teams across the organization, pganalyze has helped Notion maintain a seamless user experience during a period of exponential growth. As they continue to evolve, Notion continues to rely on pganalyze to test database performance against new features and resolve incidents before impacting the user experience in the application.


Enjoy blog posts like this?

Get them once a month to your inbox