Postgres Views in Django
At my first job, we worked with a lot of data. I quickly found that when there's a lot of data, there are bound to be some long, convoluted SQL queries. Many of ours contained multiple joins, conditionals, and filters. One of the ways we kept the complexity manageable was to create Postgres views for common queries.
Postgres views allow you to query against the results of another query. Views can be composed of columns from one or more tables or even other views, and they are easy to work with in a Django app. In this article, you’ll learn about the two different types of Postgres views and how to decide when and if you should use them. Finally, you’ll create a view and set up a Django app to use it.
Why Postgres views?
One reason to use a view is that they help cut down on complexity.
For example, your customer data may be spread across several tables: a customers
table, an emails
table, and an addresses
table. Addresses could reference more data in a cities
and states
table. This is an effective schema for your data, but you have to join all these tables every time you want to get a complete view of a customer. This may not be bad if you only do this occasionally, but it’s quite cumbersome if you’re going to query it often. Even if you only want two or three records, you still need to perform all these joins.
You can solve this problem by creating a view that looks like this:
CREATE VIEW complete_customer_data AS
SELECT
concat(customers.first_name,' ', customers.last_name) AS customer_name,
addresses.street_address AS street,
addresses.zip_code AS zip,
cities.city, AS city
states.state AS state,
FROM customers
INNER JOIN addresses ON customers.id = addresses.customer_id
INNER JOIN cities ON addresses.city_id = cities.id
INNER JOIN state ON cities.state_id = state.id;
As you can see, a view is just a query. Now, if you want to query all your customers from “Chicago,” you can query the view, which is much easier to write and more readable.
SELECT * FROM complete_customer_data
WHERE city = 'Chicago';
Materialized Views in Postgres
Views are great for simplifying code, but with large datasets, you're not really saving any time when you run them because a view is only as fast as its underlying query. For costly queries and large datasets, this can be a drawback.
A better solution when performance is a concern might be to create a materialized view. Materialized views allow you to cache the results of the query on disk in a temporary table. This makes running queries against the view much faster.
The drawback to materialized views is that the cached results do not automatically update when the data in the base tables changes. So in the example above, if a customer changed their address and we made our view a materialized view, we would not see the change until we refreshed the view. This reruns the query and caches the new results. You’ll see an example of this in the next section.
Creating a Materialized View in Postgres
Imagine you have an online store and want to send out coupons to customers, offering them different deals based on how often they shop and where they live.
You will start with a query that tracks customers by order frequency, how much they ordered, and where they are ordering from. Rather than rewriting this query each time, you can create a view that allows you to find a subset of customers. For example, you might want to see all customers who live in Texas that have bought more than three products in the past five months. Since this query needs to check against all of your customers and all of their orders, it will take a long time to run, so use a materialized view that you can refresh as often as you need to.
CREATE MATERIALIZED VIEW customer_order_volume AS
SELECT
concat(customers.id, '-', orders.id) AS unique_id,
concat(customers.first_name,' ', customers.last_name) AS customer_name,
orders.created_on AS purchase_date,
addresses.city AS city,
addresses.state AS state,
count(products.product_name) AS order_size,
sum(products.product_cost) AS order_cost
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id
INNER JOIN products_orders po ON orders.id = po.order_id
INNER JOIN products ON po.product_id = products.id
INNER JOIN addresses ON addresses.customer_id = orders.customer_id
GROUP BY customer_name, purchase_date, city, state;
This view combines the customer_id
and order_id
to create a unique identifier for each row. This will help you out later in the tutorial.
You can query materialized views the same way you queried the regular view, but this time, the view’s results have been cached, so the underlying query doesn’t run again.
SELECT * FROM customer_order_volume
WHERE state in ('TX', 'IL', 'OH')
ORDER BY state;
When you want to refresh the data, run:
REFRESH MATERIALIZED VIEW customer_order_volume;
Refreshing a view like this is the fastest method, but you risk blocking other connections trying to read from the view during the refresh. If you want to be able to refresh the view without interrupting read access, you’ll need to do a concurrent refresh:
REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume;
This only works if your view has a unique identifier: a column or comma separated list of columns from the view. You need to explicitly set it by creating an index on your materialized view:
CREATE UNIQUE INDEX ON customer_order_volume(unique_id);
You can also remove the view if you don't need it anymore:
DROP VIEW customer_order_volume;
Using Postgres Views in Django and Python
First, the bad news: as of this writing, Django's ORM cannot create views for you. You’ll have to write some raw SQL for Django to run during the migration.
The good news is that once the view is created, it's relatively easy to use it in Django. You just need to set up a model like you would for any other table in the database. In the following sections, you’ll create a materialized view and a method to refresh it. If you want to use a regular view, the process is the same, you just won’t need the refresh method.
The Model
The model attributes should reflect the columns returned by your view just like they would for any other table.
from django.db import models
class CustomerOrderVolume(models.Model):
unique_id = models.CharField(max_length=255, primary_key=True)
customer_name = models.CharField(max_length=255)
city = models.CharField(max_length=255)
state = models.CharField(max_length=255)
purchase_date = models.DateField()
order_size = models.IntegerField()
order_cost = models.FloatField()
class Meta:
managed = False
db_table='customer_order_volume'
Most notable here is the Meta
class. Setting manage
to false tells Django you don't need it to create the table in the migration. You also need to explicitly set the db_table
name so that Django knows which table to run queries on.
The last thing to note about the model is that you need to set one of our fields as a primary key. Otherwise, Django will expect a column called id
and throw an error when it doesn't find one. In this case, you can again take advantage of the unique ID field you’ll create for the view.
Create your migration as usual. After the migration is created, add a call to the RunSQL
method in the options section of the migration to create the view:
from django.db import migrations, models
class Migration(migrations.Migration):
initial = True
dependencies = [
]
operations = [
migrations.CreateModel(
name='CustomerOrderVolume',
fields=[
('unique_id', models.CharField(max_length=255, primary_key=True, serialize=False)),
('customer_name', models.CharField(max_length=255)),
('city', models.CharField(max_length=255)),
('state', models.CharField(max_length=255)),
('purchase_date', models.DateField()),
('order_size', models.IntegerField()),
('order_cost', models.FloatField()),
],
options={
'db_table': 'customer_order_volume',
'managed': False,
},
),
migrations.RunSQL(
"""
CREATE MATERIALIZED VIEW customer_order_volume AS
SELECT
concat(customers.id, orders.id) AS unique_id,
concat(customers.first_name,' ', customers.last_name) AS customer_name,
orders.created_on AS purchase_date,
addresses.city AS city,
addresses.state AS state,
count(products.product_name) AS order_size,
sum(products.product_cost) AS order_cost
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id
INNER JOIN products_orders po ON orders.id = po.order_id
INNER JOIN products ON po.product_id = products.id
INNER JOIN addresses ON addresses.customer_id = orders.customer_id
GROUP BY unique_id, customer_name, purchase_date, city, state;
""",
"DROP VIEW customer_order_volume;"
)
]
Supply the RunSQL
method with SQL code to create and destroy the view. When you run the migrations, Django won’t create a customer_order_volume
table because you set managed to false
, but it will run the raw SQL and create the view for you.
Finally, create a refresh method that you can call anytime you want to update your materialized view. I chose to create it as a class method, but this is not required. You can do this anywhere since all you are doing is executing raw SQL.
@classmethod
def refresh_view(cl):
with connection.cursor() as cursor:
cursor.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume")
This method can be called whenever you want to repopulate your view’s data. This could be done via a cron job that runs at night when traffic to the site is low.
Now, you can test the view from the Django shell:
In [3]: c = CustomerOrderVolume.objects.all()
In [4]: c
Out[4]: <QuerySet [<CustomerOrderVolume: CustomerOrderVolume object (Jonathan Griffith)>, <CustomerOrderVolume: CustomerOrderVolume object (Stephanie Fernandez)>, <CustomerOrderVolume: CustomerOrderVolume object (Austin Burns)>, ....
You can see that Django returns a query set just like it would with any other model. Similarly, you can filter and access attributes on the objects just as you'd expect:
In [3]: order = c.first()
In [4]: order
Out[4]: <CustomerOrderVolume: CustomerOrderVolume object (Adam Turner)>
In [5]: order.purchase_date
Out[5]: datetime.datetime(2020, 7, 9, 20, 50, 43, 895459)
This is a good start. From here, a helpful addition would be a database table that keeps track of how often the view gets refreshed. You could set up a cron job to run your refresh function for you at night or on the weekends, or it could be called from a signal when the underlying models are updated. Be aware that the refresh might take a while if you have a lot of underlying data, so you probably don’t want to call it too frequently.
Conclusion
In this post, you saw the two different types of views available in Postgresql, and the reasons you might want to create a view for your application. Views are useful if you want to limit the amount of code you write each time you query the database, cut down on the complexity of a large query, or cache the results of a costly query. Whatever your reason, once your view is created, it's just a matter of setting up your Django model correctly to get it working in your Python application. Finally, don’t forget to create a refresh method to update the view if you elect to use a materialized view.
Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.
PS: If you are interested in learning about views and materialized views in Ruby on Rails check out our article about it here: Effectively Using Materialized Views in Ruby on Rails