Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

Customer Story:
How Notion Runs PostgreSQL at Scale on Amazon RDS with pganalyze

Notion, the popular productivity tool, has undergone tremendous growth since its founding in 2015. By 2021, a viral TikTok trend led to exponential user growth, bringing with it significant challenges to scale its Postgres-based database infrastructure. To help address these growing pains, Notion adopted pganalyze to ensure optimal PostgreSQL database performance, as well as maintaining reliability and availability.

The challenge

From 2015 to 2020, Notion relied on a single large PostgreSQL database hosted on Amazon RDS, which served them well during the early stages of product development. However, as user adoption surged, this setup began to show cracks. Challenges included outages, inefficiencies in vacuum processes, and scaling limitations that necessitated major infrastructure overhauls.

We couldn’t understand why our database was struggling so much with things like vacuums. It was just hard to get visibility into what is happening in the internals. None of the tools we had, like RDS Performance Insights or application-level logging, gave us the deep insights we needed.
— Arka Ganguli, Engineering Manager, Notion

In early 2021, Notion began its journey to address these issues. This included sharding its core database and seeking tools to provide deeper visibility into PostgreSQL internals. After a recommendation from PGExperts, a leading PostgreSQL consultancy, Notion adopted pganalyze in March 2021 as a critical solution for managing database performance.

The impact of pganalyze on Notion’s database operations

Notion product screenshot

With pganalyze, Notion engineers gain visibility into issues that were previously opaque, such as unused indexes contributing to table bloat and inefficient queries consuming excessive CPU and I/O.

During incidents, pganalyze gives us the visibility to pinpoint what’s happening. It’s especially helpful for understanding spikes in CPU or query load and distinguishing between root causes and symptoms.
— Arka Ganguli, Engineering Manager, Notion

A particularly impactful improvement involved refining a GIN index on the space_id and permission columns, used for JSON filtering with the contains operator (@>). On Notion’s development database, the index was not being used due to poor cardinality and statistics. With the help of pganalyze, it was determined that switching to a jsonb_path_ops index would better serve the query patterns for larger workspaces.

Implementing this change resulted in a 733% performance improvement, reducing query runtime by more than 8x, from approximately 5000 ms to 600 ms. Comparisons in staging showed no mismatches, paving the way for a seamless deployment to production.

Another time, Ben Hughes, Software Engineer on the infrastructure team, recalled encountering a performance issue while a new feature was in pre-production testing. The database was becoming overloaded, causing pages in the app to load slowly or crash entirely. Some initial investigation revealed that this new feature was responsible for 20% of the application’s query load in its current state.

pganalyze is part of our first line of defense. If we’re having a problem on the database, I know I can check pganalyze – whether it’s a vacuum process holding us up or if I have to do some more digging.
— Ben Hughes, Software Engineer, Notion

Adoption across engineering teams

Initially adopted by the infrastructure team, pganalyze provided much-needed insights into database performance bottlenecks. Over time, its use expanded across engineering teams, empowering application developers to identify and resolve database issues independently. This broad adoption was facilitated by integrating pganalyze with Okta, allowing seamless access and fostering collaboration.

Lessons learned and future plans

Notion’s experience highlights the importance of investing in database visibility and proactive monitoring tools at the right time. By equipping engineers with pganalyze, the company reduced its reliance on infrastructure specialists while enabling faster incident resolution.

My favorite thing about pganalyze is how intuitive it is. When I share a link with someone, they just get it. It’s not hard to understand what the tool is showing you, which is not something you can say about a lot of other tools.
— Arka Ganguli, Engineering Manager, Notion

Conclusion

Notion’s partnership with pganalyze has been pivotal in its ability to scale PostgreSQL effectively. By providing actionable insights and empowering teams across the organization, pganalyze has helped Notion maintain a seamless user experience during a period of exponential growth.