Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

Efficient Pagination: PostgreSQL and Django

You could say most web frameworks take a naive approach to pagination. Using PostgreSQL’s COUNT, LIMIT, and OFFSET features for pagination works fine for the majority of web applications, but if you have tables with a million records or more, performance degrades quickly.

Django is an excellent framework for building web applications, but its default pagination method falls into this trap at scale. In this article, I’ll help you understand Django’s pagination limitations and offer three alternative methods that will improve your application’s performance. Along the way, you’ll see the tradeoffs and use cases for each method so you can decide which is the best fit for your application.

Understanding Naive PostgreSQL Pagination

Let’s have a look at an example of the type of pagination control that a web application might use:

Typical pagination navigation in a web application

In this control, the user may go to the previous and next pages or jump directly to a specific page. The query to get the tenth page using Postgres’ LIMIT and OFFSET approach might look like this:

SELECT *
FROM users
ORDER BY created_at DESC
LIMIT 10
OFFSET 100

Notice that to get the correct OFFSET, you must multiply the page number you want by the LIMIT.

There's one more query needed to display our pagination control. You must know the number of records in the table. Without that information, you won’t know how many pages you need to seek through.

SELECT count(*)
FROM users

You might see how this approach can become a real performance issue very quickly.

Performance of Naive PostgreSQL Pagination

To better understand the performance bottleneck of LIMIT and OFFSET, you can import some test data and try it out. First, create a table large enough to encounter slowdowns:

CREATE TABLE USERS (
    id serial,
    name varchar(50)
);

INSERT INTO users
SELECT
    --- Ten million records
    generate_series(1,10000000) AS id,
    --- Example: "e6f2c6842d146c518185e1e47add9532"
    substr(md5(random()::text), 0, 50) AS name;

When you run the query to get the tenth page of results, the response is nearly instant. On my 2018 Macbook Pro with the latest version of Postgres, I see data in 89ms.

Response time of the tenth page of results

However, for queries farther in, the wait times increase. With a LIMIT of 10 and an OFFSET of 5,000,000, a response takes 2.62 seconds.

Response time of the five millionth record

Finally, you can look at the COUNT query, which runs on all 10 million rows. That query now takes a lethargic 4.45 seconds.

Count query on a large database table

The slow performance in these examples is caused by the way that OFFSET and COUNT work. Getting to the specified page using OFFSET requires the database to traverse each index up to the page you want. Therefore, the performance degrades the farther you peer into the table.

The naive approach to pagination using COUNT, LIMIT, and OFFSET is only a viable solution for tables under a million rows. In large tables, you can expect to see slow queries and congruently a poor user experience.

Pagination Handling in Django

Now that you have some background knowledge on the performance of pagination queries in Postgres, you can start to understand why pagination slows down in Django. To demonstrate how Django handles pagination, I created a new application with a User model and inserted 10 million records by adapting our earlier query.

# {project}/users/models.py
from django.db import models

class User(models.Model):
    name = models.CharField(max_length=50)

I used the admin site to test the pagination speed since it works out of the box.

# {project}/users/admin.py
from django.contrib import admin
from .models import User

admin.site.register(User)

Now that the User model is presented in the admin panel, I can see the table with 10 million records.

10,000,000 records in the Django admin panel

Using django-debug-toolbar I can peer into the SQL queries that Django is generating in real-time. There are two queries used to generate this UI:

-- Count the total number of records - 2.43 seconds
SELECT COUNT(*) AS "__count"
  FROM "users_user"

-- Get first page of items - 2ms
SELECT "users_user"."id",
       "users_user"."name"
  FROM "users_user"
 ORDER BY "users_user"."id" DESC
 LIMIT 100

These queries should look familiar because they are almost identical to the naive pagination queries above. Strangely, the count query is triggered twice, which means that when you load the Django admin panel, you have to wait for the database to count every single row of the table two times.

When you click on page 99,999, Django will fire off two count queries again and another pagination query using LIMIT and OFFSET:

-- Get the 99,999 page (100 results per page) - 13.34 seconds
SELECT "users_user"."id",
       "users_user"."name"
  FROM "users_user"
 ORDER BY "users_user"."id" DESC
 LIMIT 100
OFFSET 9999900

This query takes a whopping 13 seconds to finish!

Clearly, the naive approach to pagination in Django is slow for large tables. Over time, your database tables will likely grow, and as they reach tens of millions of records, you and your customers are going to start to notice these terrible load times. So what can you do about it?

In the following sections, I’ll show you three options for improving your pagination performance in a Django application.

Comparing the performance of pagination methods in Django and Postgres

PostgreSQL Pagination in Django: Option 1 – Removing the COUNT Query

The COUNT query dominates the loading time for the first page of results. When skipping to later pages, the offset query is the slowest, but I’ll focus on improving the COUNT first in this first option.

This may come as a surprise, but one solution is to remove the count query completely.

Won’t that break the UI!?

Sort of… In this case, it might be reasonable not to know how many pages are in the Users table. It’s not often that users will find themselves at the five millionth page of a table of records. Navigating to the next and previous page is typically enough control. Using a search box or filter is likely a better method for finding the record you want. Look at Django’s search_fields for information on enabling search and filtering in the admin panel and feel free to read through this pganalyze article about Full Text Search in Django.

The most well-known example of this type of pagination is the Google search results page. At the bottom of the page, a truncated pagination control shows direct links to only the first 10 pages:

No count of the total results in the Google pagination

This doesn’t mean that Google is preventing you from seeing the rest of the billions of results. It’s simply telling you that a refined search term would be a better way to get to those results than pagination.

If getting rid of the COUNT makes sense in your application, Django makes it easy to hide. First, overwrite the count property of the default Paginator:

# {project}/users/paginator.py
from django.core.paginator import Paginator
from django.utils.functional import cached_property

class UserPaginator(Paginator):
    
    @cached_property
    def count(self):
        return 9999999999

Notice the placeholder value is a number much larger than you expect to have results for.

Django responds to this adjustment by displaying the first few pages in the pagination component, as you would expect. However, the last few pages will be the fake count. When you click on a page that doesn’t exist, Django will take you to the last page no matter how many records you have.

After you override the default paginator, import it into your admin model:

# {project}/users/admin.py
from django.contrib import admin
from .models import User
from .paginator import UserPaginator

@admin.register(User)
class UserTableAdmin(admin.ModelAdmin):
    show_full_result_count = False
    paginator = UserPaginator

Note that I also set show_full_result_count to False. This will turn off the second count query that I noted earlier.

After updating my application with these changes, I reduced the time for the first page from ~5 seconds to 8ms. Keep in mind that this table is still suffering from slow OFFSET queries though. Jumping to page 50,0000 took 18 seconds. Before I show you how to address the OFFSET problem, I’ll show you one more method to improve the COUNT query.

Download Free eBook: How To Get 3x Faster Postgres

PostgreSQL Pagination in Django: Option 2 – Approximating the COUNT

Another way to reduce the time spent on the COUNT query is to use some built-in Postgres features to estimate the total number of records when the count takes too long. You can see a thorough implementation of the approach in this gist which overloads the count method in Django’s Paginator class.

The first things to do is to set a statement_timeout on the query and fallback to an estimated count. You can use atomic transactions to set the timeout to 150ms.

...
    try:
        with transaction.atomic(), connection.cursor() as cursor:
            # Limit to 150 ms
            cursor.execute('SET LOCAL statement_timeout TO 150;')
            return super().count
    except OperationalError:
            pass
...

If the count method returns data before the time limit is up, then the real value is used. However, if the query takes longer, Django will fallback to an approximate value stored in the pg_class metadata. That metadata is updated when commands like VACUUM, ANALYZE and CREATE INDEX are called, or autovacuum runs on the table.

...
        with transaction.atomic(), connection.cursor() as cursor:
            # Obtain estimated values (only valid with PostgreSQL)
            cursor.execute(
                    "SELECT reltuples FROM pg_class WHERE relname = %s",
                    [self.object_list.query.model._meta.db_table]
            )
            estimate = int(cursor.fetchone()[0])
            return estimate
...

After implementing this method, the maximum time you will spend loading the COUNT is 150ms.

PostgreSQL Pagination in Django: Option 3 – Keyset Pagination

To solve the slow OFFSET problem, you can replace it with keyset (or seek) pagination. In keyset pagination, each page is fetched by an ordered field like an id or created_at date. Instead of iteratively counting pages, as OFFSET does, keyset pagination filters directly by the ordered field.

Example of using keyset pagination

SELECT *
    FROM user
    WHERE id < 60 -- The last item in the previous page
    ORDER BY id DESC
    LIMIT 10

This approach is a little different from OFFSET because you must know the value you are starting at. Imagine that you are on page four of the table, and you know the first id of page five. Instead of counting all the records, you can go directly to that id and return the next ten items.

This works because indexes can support a query like this efficiently. Asking a B-tree index on id to return 10 entries before a certain id only requires loading 10 index entries. Contrast that to using OFFSET, where all entries up to the offset and then the specified limit need to be loaded, making high offsets very expensive.

When using keyset pagination together with the right index, you will see a significant performance boost. The time complexity to query any record in the database is constant. For example, seeking the last page of the large table generated above takes just 78ms.

This method also guards against sparse data. If a user is deleted and the order is not sequential in the table anymore, keyset pagination is not affected; it will skip the missing value with no problem.

Trade-offs of keyset pagination

Keyset pagination comes with a couple of trade-offs. Without the offset, you don’t know exactly how many pages there are in the table or which page number you are currently on. Additionally, keyset pagination requires a sortable field on your model. Sequential IDs and date fields work well.

Using the dj-pagination plugin for Django

Unfortunately, I could not find a library that extends Django’s core Paginator to add keyset pagination. The admin table requires a paginator of that type, so I'll demonstrate the same generated data in a new application view using the dj-pagination plugin.

Add the application into your INSTALLED_APPS and middleware - the docs explain well. Then, create a view in your Users app:

# {project}/users/views.py
from django.shortcuts import render
from .models import User


def index(request):
    context = {
        'users': User.objects.order_by('id').all()
    }
    return render(request, 'users/index.html', context)

This code makes the users QuerySet available in the view context and tells it to render the template file. Next, add the template directory to your TEMPLATES setting object and add your new template file:

# {project}/templates/users/index.html
{% load pagination_tags %}

{% autopaginate users %}

{% for user in users %}
    {{ user.name }}
{% endfor %}

{% paginate %}

In a real application, you would add additional markup and styling to show the list of users, but this demonstrates the tags that dj-pagination makes available to you.

Now that you have a view and template, create a urls.py file to route requests to your view:

# {project}/users/urls.py
from django.urls import path
from . import views

app_name = 'users'
urlpatterns = [
    # ex: /users/
    path('', views.index, name='index'),
]

Finally, add it to your root URLs:

// various imports...

urlpatterns = [
    // routes...
    path('users/', include('users.urls')),
]

Now, when you point your browser to the /users page, you will see a list of usernames with simple pagination controls. The generated query returns results in less than 100ms, regardless of which page I navigate to.

If you are looking for a way to have more control over keyset pagination, django-infinite-scroll-pagination might be worth a look.

Conclusion

In this article, you learned about how pagination works in Django. While naive pagination performs well for small tables, this method quickly degrades in performance as your table grows to millions of rows. Furthermore, jumping to a record deep in the table will be very slow in a query that uses OFFSET.

The good news is that you can speed things up by altering the COUNT query. Additionally, switching to keyset pagination will improve the performance of page lookups and make them work in constant time. Django makes it easy to alter its default configuration, giving you the power to build a performant solution for pagination in Django.

Share this article: If you liked this article you might want to tweet it to your peers.


Enjoy blog posts like this?

Get them once a month to your inbox