Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

Postgres Views in Django

At my first job, we worked with a lot of data. I quickly found that when there's a lot of data, there are bound to be some long, convoluted SQL queries. Many of ours contained multiple joins, conditionals, and filters. One of the ways we kept the complexity manageable was to create Postgres views for common queries.

Postgres views allow you to query against the results of another query. Views can be composed of columns from one or more tables or even other views, and they are easy to work with in a Django app. In this article, you’ll learn about the two different types of Postgres views and how to decide when and if you should use them. Finally, you’ll create a view and set up a Django app to use it.

PostgreSQL views made of data from columns in multiple tables

Why Postgres views?

One reason to use a view is that they help cut down on complexity.

For example, your customer data may be spread across several tables: a customers table, an emails table, and an addresses table. Addresses could reference more data in a cities and states table. This is an effective schema for your data, but you have to join all these tables every time you want to get a complete view of a customer. This may not be bad if you only do this occasionally, but it’s quite cumbersome if you’re going to query it often. Even if you only want two or three records, you still need to perform all these joins.

You can solve this problem by creating a view that looks like this:

CREATE VIEW complete_customer_data AS
SELECT
  concat(customers.first_name,' ', customers.last_name) AS customer_name,
  addresses.street_address AS street,
  addresses.zip_code AS zip,
  cities.city, AS city
  states.state AS state,
FROM customers
  INNER JOIN addresses ON customers.id = addresses.customer_id
  INNER JOIN cities ON addresses.city_id = cities.id
  INNER JOIN state ON cities.state_id = state.id;

As you can see, a view is just a query. Now, if you want to query all your customers from “Chicago,” you can query the view, which is much easier to write and more readable.

SELECT * FROM complete_customer_data 
WHERE city = 'Chicago';

Materialized Views in Postgres

Views are great for simplifying code, but with large datasets, you're not really saving any time when you run them because a view is only as fast as its underlying query. For costly queries and large datasets, this can be a drawback.

A better solution when performance is a concern might be to create a materialized view. Materialized views allow you to cache the results of the query on disk in a temporary table. This makes running queries against the view much faster.

The drawback to materialized views is that the cached results do not automatically update when the data in the base tables changes. So in the example above, if a customer changed their address and we made our view a materialized view, we would not see the change until we refreshed the view. This reruns the query and caches the new results. You’ll see an example of this in the next section.

Creating a Materialized View in Postgres

Imagine you have an online store and want to send out coupons to customers, offering them different deals based on how often they shop and where they live.

You will start with a query that tracks customers by order frequency, how much they ordered, and where they are ordering from. Rather than rewriting this query each time, you can create a view that allows you to find a subset of customers. For example, you might want to see all customers who live in Texas that have bought more than three products in the past five months. Since this query needs to check against all of your customers and all of their orders, it will take a long time to run, so use a materialized view that you can refresh as often as you need to.

CREATE MATERIALIZED VIEW customer_order_volume AS
SELECT
  concat(customers.id, '-', orders.id) AS unique_id,
  concat(customers.first_name,' ', customers.last_name) AS customer_name, 
  orders.created_on AS purchase_date,
  addresses.city AS city,
  addresses.state AS state,
  count(products.product_name) AS order_size,
  sum(products.product_cost) AS order_cost
FROM orders
  INNER JOIN customers ON orders.customer_id = customers.id
  INNER JOIN products_orders po ON orders.id = po.order_id
  INNER JOIN products ON po.product_id = products.id
  INNER JOIN addresses ON addresses.customer_id = orders.customer_id
GROUP BY customer_name, purchase_date, city, state;

This view combines the customer_id and order_id to create a unique identifier for each row. This will help you out later in the tutorial.

You can query materialized views the same way you queried the regular view, but this time, the view’s results have been cached, so the underlying query doesn’t run again.

SELECT * FROM customer_order_volume
WHERE state in ('TX', 'IL', 'OH')
ORDER BY state;

When you want to refresh the data, run:

REFRESH MATERIALIZED VIEW customer_order_volume;

Refreshing a view like this is the fastest method, but you risk blocking other connections trying to read from the view during the refresh. If you want to be able to refresh the view without interrupting read access, you’ll need to do a concurrent refresh:

REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume;

This only works if your view has a unique identifier: a column or comma separated list of columns from the view. You need to explicitly set it by creating an index on your materialized view:

CREATE UNIQUE INDEX ON customer_order_volume(unique_id);

You can also remove the view if you don't need it anymore:

DROP VIEW customer_order_volume;

Using Postgres Views in Django and Python

First, the bad news: as of this writing, Django's ORM cannot create views for you. You’ll have to write some raw SQL for Django to run during the migration.

The good news is that once the view is created, it's relatively easy to use it in Django. You just need to set up a model like you would for any other table in the database. In the following sections, you’ll create a materialized view and a method to refresh it. If you want to use a regular view, the process is the same, you just won’t need the refresh method.

Download Free eBook: How To Get 3x Faster Postgres

The Model

The model attributes should reflect the columns returned by your view just like they would for any other table.

from django.db import models

class CustomerOrderVolume(models.Model):
    unique_id   = models.CharField(max_length=255, primary_key=True)
    customer_name = models.CharField(max_length=255)
    city          = models.CharField(max_length=255)
    state         = models.CharField(max_length=255)
    purchase_date = models.DateField()
    order_size    = models.IntegerField()
    order_cost    = models.FloatField()

    class Meta:
        managed = False
        db_table='customer_order_volume'

Most notable here is the Meta class. Setting manage to false tells Django you don't need it to create the table in the migration. You also need to explicitly set the db_table name so that Django knows which table to run queries on.

The last thing to note about the model is that you need to set one of our fields as a primary key. Otherwise, Django will expect a column called id and throw an error when it doesn't find one. In this case, you can again take advantage of the unique ID field you’ll create for the view.

Create your migration as usual. After the migration is created, add a call to the RunSQL method in the options section of the migration to create the view:

from django.db import migrations, models

class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='CustomerOrderVolume',
            fields=[
                ('unique_id', models.CharField(max_length=255, primary_key=True, serialize=False)),
                ('customer_name', models.CharField(max_length=255)),
                ('city', models.CharField(max_length=255)),
                ('state', models.CharField(max_length=255)),
                ('purchase_date', models.DateField()),
                ('order_size', models.IntegerField()),
                ('order_cost', models.FloatField()),
            ],
            options={
                'db_table': 'customer_order_volume',
                'managed': False,
            },
        ),
         migrations.RunSQL(
            """
            CREATE MATERIALIZED VIEW customer_order_volume AS
                SELECT
                concat(customers.id, orders.id) AS unique_id, 
                concat(customers.first_name,' ', customers.last_name) AS customer_name, 
                orders.created_on AS purchase_date,
                addresses.city AS city,
                addresses.state AS state,
                count(products.product_name) AS order_size,
                sum(products.product_cost) AS order_cost
                FROM orders
                INNER JOIN customers ON orders.customer_id = customers.id
                INNER JOIN products_orders po ON orders.id = po.order_id
                INNER JOIN products ON po.product_id = products.id
                INNER JOIN addresses ON addresses.customer_id = orders.customer_id
                GROUP BY unique_id, customer_name, purchase_date, city, state;
            """,
            "DROP VIEW customer_order_volume;"
        )
    ]

Supply the RunSQL method with SQL code to create and destroy the view. When you run the migrations, Django won’t create a customer_order_volume table because you set managed to false, but it will run the raw SQL and create the view for you.

Finally, create a refresh method that you can call anytime you want to update your materialized view. I chose to create it as a class method, but this is not required. You can do this anywhere since all you are doing is executing raw SQL.

@classmethod
    def refresh_view(cl):
        with connection.cursor() as cursor:
            cursor.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_volume")

This method can be called whenever you want to repopulate your view’s data. This could be done via a cron job that runs at night when traffic to the site is low.

Now, you can test the view from the Django shell:

In [3]: c = CustomerOrderVolume.objects.all()
In [4]: c
Out[4]: <QuerySet [<CustomerOrderVolume: CustomerOrderVolume object (Jonathan Griffith)>, <CustomerOrderVolume: CustomerOrderVolume object (Stephanie Fernandez)>, <CustomerOrderVolume: CustomerOrderVolume object (Austin Burns)>, ....

You can see that Django returns a query set just like it would with any other model. Similarly, you can filter and access attributes on the objects just as you'd expect:

In [3]: order = c.first()
In [4]: order
Out[4]: <CustomerOrderVolume: CustomerOrderVolume object (Adam Turner)>
In [5]: order.purchase_date
Out[5]: datetime.datetime(2020, 7, 9, 20, 50, 43, 895459)

This is a good start. From here, a helpful addition would be a database table that keeps track of how often the view gets refreshed. You could set up a cron job to run your refresh function for you at night or on the weekends, or it could be called from a signal when the underlying models are updated. Be aware that the refresh might take a while if you have a lot of underlying data, so you probably don’t want to call it too frequently.

Conclusion

In this post, you saw the two different types of views available in Postgresql, and the reasons you might want to create a view for your application. Views are useful if you want to limit the amount of code you write each time you query the database, cut down on the complexity of a large query, or cache the results of a costly query. Whatever your reason, once your view is created, it's just a matter of setting up your Django model correctly to get it working in your Python application. Finally, don’t forget to create a refresh method to update the view if you elect to use a materialized view.

Share this article: If you liked this article we’d appreciate it if you’d tweet it to your peers.

PS: If you are interested in learning about views and materialized views in Ruby on Rails check out our article about it here: Effectively Using Materialized Views in Ruby on Rails


Enjoy blog posts like this?

Get them once a month to your inbox