Effectively Using Materialized Views in Ruby on Rails
It's every developer's nightmare: SQL queries that get large and unwieldy. This can happen fairly quickly with the addition of multiple joins, a subquery and some complicated filtering logic. I have personally seen queries grow to nearly one hundred lines long in both the financial services and health industries.
Luckily Postgres provides two ways to encapsulate large queries: Views and Materialized Views. In this article, we will cover in detail how to utilize both views and materialized views within Ruby on Rails, and we can even take a look at creating and modifying them with database migrations.
Our example will be a real-world sized dataset of hockey teams and their top scorers. If you'd like to follow along, the source code covered in this article can be found here.
We'll also talk a bit about the performance benefits that a Materialized View can bring to your application:
A view allows us to query against the result of another query, providing a powerful way of abstracting away a complex query full of joins, conditions, groupings, and any other clause that can be added to an SQL query. Looking at the query below, it isn't overly complex, but it does include 3 joins, grouping by a number of fields to aggregate the numbers of goals scored for a player each season.
It takes approximately 450ms to execute on my computer. I am using seed data that generates 31 teams, each playing 200 games in a season, scoring 20 goals per game... a little unrealistic, but I wanted the dataset used to be substantial!
SELECT players.name AS player_name, players.id AS player_id, players.position AS player_position, matches.season AS season, teams.name AS team_name, teams.id AS team_id, count(goals.id) AS goal_count FROM goals INNER JOIN players ON (goals.player_id = players.id) INNER JOIN matches ON (goals.match_id = matches.id) INNER JOIN teams ON (goals.team_id = teams.id) GROUP BY players.id, teams.id, matches.season
A view allows us to take the final result of this query, and query against that as if it were any other table. You can see why views can come in handy in many different scenarios. They allow for the succinct abstraction of a complicated query, and allow us to re-use this logic in a simple to understand way.
Now, we could make a new view by running
CREATE VIEW in Postgres. But, as we all know, one-off schema changes are hard to keep track of. Instead, let's try something thats closer to how Rails does things. How does that look like? First things first, we'll create a view using Scenic. Scenic gives us the ability to define migrations that create, update, or drop views, just as you're used to doing with regular tables in Rails.
rails g scenic:view top_scorers
This will generate two files. The first is named
db/views/top_scorers_v01.sql, and in it we will paste the SQL for the underlying query (from above). The second is
db/migrate/[date]_create_top_scorers.rb, and this is where the migration will live to migrate/rollback the creation of our view:
class CreateTopScorers < ActiveRecord::Migration[6.0] def change create_view :top_scorers end end
With the view in place, we can now query against it. This query takes approximately 50ms to execute.
SELECT * FROM top_scorers WHERE team_name = 'Toronto Maple Leafs' ORDER BY goal_count DESC
By creating a model in Rails, we can interact with it much like we would be able to with a typical model which is backed by a table. First things first, let's define the model, letting Rails know it is read-only. Whilst some views can be updated, this view contains a top-level
GROUP BY clause and thus can't be updated.
# app/models/top_scorer.rb class TopScorer < ApplicationRecord def readonly? true end end
Now we can perform the same SQL query using the
TopScorer.where(team_name: 'Toronto Maple Leafs').order(goal_count: :desc)
A regular view still performs the underlying query which defined it. It will only be as efficient as its underlying query is. This means, if the larger query discussed above takes 450ms to execute, executing
SELECT * FROM top_scorers will also take 450ms.
Materialized views take regular views to the next level, though they aren't without their drawbacks. The difference is that they save the result of the original query to a cached/temporary table. When you query a materialized view, you aren't querying the source data, rather the cached result.
This can provide serious performance benefits, especially considering you can index materialized views. But, when the underlying data from the source tables is updated, the materialized view becomes out of date, serving up an older cached version of the data. We can resolve this by refreshing the materialized view, which we'll get to in a bit.
Just like we saw with our regular view, materialized views begin the same way, by executing a command to generate a new view migration:
rails g scenic:view mat_top_scorers. This produces two files, the first of which contains the SQL to produce the underlying view of the data. The difference is in the migration, passing in
materialized: true to the
create_view method. Also notice that we are able to add indexes to the materialized view.
class CreateMatTopScorers < ActiveRecord::Migration[6.0] def change create_view :mat_top_scorers, materialized: true add_index :mat_top_scorers, :player_name add_index :mat_top_scorers, :player_id add_index :mat_top_scorers, :team_name add_index :mat_top_scorers, :team_id add_index :mat_top_scorers, :season end end
Like a regular view, we are able to define an ActiveRecord model that can query it. Also notice that we can define relationships which point to other ActiveRecord models. If you didn't know, you might not even realize it is pointing to a materialized view, except for the
readonly? method which was defined.
class MatTopScorer < ApplicationRecord belongs_to :player belongs_to :team belongs_to :match def self.top_scorer_for_season(season) where(season: season).order(goal_count: :desc).first end def readonly? true end end
Let's take our new materialized view for a spin! Running the query
select * from mat_top_scorers, which took 450ms as a view, takes 5ms as a materialized view, 90x faster! The ruby code below, which took 50ms as a view, takes under 1ms to execute!
MatTopScorer.where(team_name: 'Toronto Maple Leafs').order(goal_count: :desc)
For a side-by-side comparison, this performs the same query on both views:
irb(main):001:0> TopScorer.where(team_name: 'Toronto Maple Leafs').count (60.2ms) SELECT COUNT(*) FROM "top_scorers" WHERE "top_scorers"."team_name" = $1 [["team_name", "Toronto Maple Leafs"]] => 30 irb(main):002:0> MatTopScorer.where(team_name: 'Toronto Maple Leafs').count (1.3ms) SELECT COUNT(*) FROM "mat_top_scorers" WHERE "mat_top_scorers"."team_name" = $1 [["team_name", "Toronto Maple Leafs"]] => 30
As mentioned previously, materialized views cache the underlying query's result to a temporary table. This is what gives us the speed improvements and the ability to add indexes. The downside is that we have to control when the cache is refreshed. Modifying the
MatTopScorer model, let's add a
refresh method that can be called any time the data is to be refreshed. You will need to figure out how often it makes sense to update the data for your specific use-case, depending on how often the data is changing and how quickly those changes need to be reflected to the end user.
class MatTopScorer < ApplicationRecord belongs_to :player belongs_to :team belongs_to :match def self.refresh Scenic.database.refresh_materialized_view(table_name, concurrently: false, cascade: false) end def self.top_scorer_for_season(season) where(season: season).order(goal_count: :desc).first end def readonly? true end end
To schedule the refresh, I like to use the whenever gem. Let's call a rake task to refresh the materialized view every hour:
# config/schedule.rb every 1.hour do rake "refreshers:mat_top_scorers" end
The rake task is simple, only calling the
refresh method defined on the
# lib/tasks/refreshers.rake namespace :refreshers do desc "Refresh materialized view for top scorers" task mat_top_scorers: :environment do MatTopScorer.refresh end end
Views focus on abstracting away complexity and encouraging reuse. Views allow you to interact with the result of a query as if it were a table itself, but they do not provide a performance benefit, as the underlying query is still executed, perfect for sharing logic but still having real-time access to the source data.
Materialized Views are related to views, but go a step further. You get all the abstraction and reuse of a view, but the underlying data is cached, providing serious performance benefits. Materialized views are especially useful for - for example - reporting dashboards because they can be indexed to allow for performant filtering.
If the purpose of the view is to provide a cleaner interface to complicated joins and query logic, and performance isn't too much of an issue, by all means stick with a regular view. Views have the advantage of always being real-time, since they simply reference the real underlying data rather than a cached copy of it.
If your purpose is to provide a cleaner interface in addition to performance improvements, and you can live with the data being not quite real-time, then creating it as a materialized view can provide some great benefits.
It's easy to migrate views in Scenic. Views are versioned by default in Scenic and generating a view with the same name will create a v2, providing two files, just like it did the first time we generated a view (earlier in this article). Likewise, Scenic also provides a way to drop a view.
Views and materialized views aren't particularly challenging to test, but it does require remembering that both types of views don't contain any original data in and of themselves, they are either a live view of an underlying query, or a cached view of an underlying query, as in the case of materialized views.
After creating some test data using factory_bot, we'll call a method which is supposed to return the top scorer for a given season. It returns
nil, and that is expected. The underlying data exists, but because materialized views must be refreshed, something we haven't done yet, there is no data to be found.
MatTopScorer.refresh, we're now able to retrieve the expected result.
RSpec.describe MatTopScorer, type: :model do describe "#top_scorer_for_season" do it "finds top scorer" do # create some data using factory_bot helper methods match = create(:match) player = create(:player) goal = create(:goal, match: match, player: player) # without any data in materialized view expect(MatTopScorer.top_scorer_for_season(match.season)).to eq(nil) MatTopScorer.refresh # with data in materialized view top_scorer = MatTopScorer.top_scorer_for_season(match.season) expect(top_scorer).to be_present expect(top_scorer.player).to eq(player) expect(top_scorer.goal_count).to eq(1) end end end
With the help of Scenic, using views and materialized views feels right at home in Rails. Truthfully, I haven't used views as much as I have used materialized views. In particular, I've found materialized views incredibly useful when building searchable reporting dashboards.
The ability to group and summarize data by geographic region, category, grouped by date, in combination with adding the correct indexes has provided an efficient way to report on large amounts of data without relying on external reporting systems or causing excessive load on the production database.
Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.
Leigh Halliday is a guest author for the pganalyze blog. He is a developer based out of Canada who works at FlipGive as a full-stack developer. He writes about Ruby and React on his blog and publishes React tutorials on YouTube.