Database Indexing in Django

8 minute read

Cover Page

Database Indexing

Database indexing is a technique used to optimize the speed of data retrieval operations on a database table. It works similarly to an index in a book, and uses database index to quickly locate data without scanning every row.

What is Database Index?

A database index is a data structure used to improve the speed of data retrieval operations on a database table at the cost of additional storage space and potential performance impacts on write operations (like INSERT, UPDATE, and DELETE). - Database Index

Why Use Indexing in Django?

Django, as a high-level web framework, relies on relational databases like PostgreSQL, MySQL, and SQLite. When querying large tables, without proper indexing, the database has to scan every row to find the matching data, which can be slow. Indexes optimize these lookups by reducing the number of rows scanned, leading to faster queries.

Implementing Indexing in Django

Now, let’s create a three different model, one without index, one with single index, and one with composite index.

1. Model Without Index

ProductWithoutIndex model has no indexing, meaning searches will require a full table scan, leading to slow queries on large datasets.

from django.db import models

class ProductCategory(models.TextChoices):
ELECTRONICS = "electronics", "Electronics"
CLOTHING = "clothing", "Clothing"
HOME_APPLIANCES = "home appliances", "Home Appliances"

class ProductWithoutIndex(models.Model):
name = models.CharField(max_length=100)
category = models.CharField(max_length=50,
 choices=ProductCategory.choices)
price = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        db_table = "product_without_index"

2. Model with Single Index

ProductWithSingleIndex model has three single column indexes via db_index=True on the name, category and price columns to speed up searches on frequently queried fields.

class ProductWithSingleIndex(models.Model):
name = models.CharField(max_length=100, db_index=True)
category = models.CharField(max_length=50, db_index=True)
price = models.IntegerField(db_index=True)
created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        db_table = "product_with_single_index"

3. Model with Composite Index

ProductWithCompositeIndex model has a composite index on the category and price columns. When queries filter by multiple columns together, a composite (multi-column) index is more efficient.

class ProductWithCompositeIndex(models.Model):
name = models.CharField(max_length=100)
category = models.CharField(max_length=50)
price = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        db_table = "product_with_composite_index"
        indexes = [
            models.Index(fields=['name']),  # Index on 'name'
            models.Index(fields=['category', 'price'],
             name='category_price_idx'),  # Composite index
        ]

Checking Indexes

After applying migrations, we can verify indexes.

1. ProductWithoutIndex

                             Table "public.product_without_index"
   Column   |           Type           | Collation | Nullable |             Default
------------+--------------------------+-----------+----------+----------------------------------
 id         | bigint                   |           | not null | generated by default as identity
 name       | character varying(100)   |           | not null |
 category   | character varying(50)    |           | not null |
 price      | integer                  |           | not null |
 created_at | timestamp with time zone |           | not null |
Indexes:
    "product_without_index_pkey" PRIMARY KEY, btree (id)

2. ProductWithSingleIndex

                            Table "public.product_with_single_index"
   Column   |           Type           | Collation | Nullable |             Default
------------+--------------------------+-----------+----------+----------------------------------
 id         | bigint                   |           | not null | generated by default as identity
 name       | character varying(100)   |           | not null |
 category   | character varying(50)    |           | not null |
 price      | integer                  |           | not null |
 created_at | timestamp with time zone |           | not null |
Indexes:
    "product_with_single_index_pkey" PRIMARY KEY, btree (id)
    "product_with_single_index_category_715a00f3" btree (category)
    "product_with_single_index_category_715a00f3_like" btree (category varchar_pattern_ops)
    "product_with_single_index_name_d6fb2180" btree (name)
    "product_with_single_index_name_d6fb2180_like" btree (name varchar_pattern_ops)
    "product_with_single_index_price_5a707788" btree (price)

3. ProductWithCompositeIndex

                           Table "public.product_with_composite_index"
   Column   |           Type           | Collation | Nullable |             Default
------------+--------------------------+-----------+----------+----------------------------------
 id         | bigint                   |           | not null | generated by default as identity
 name       | character varying(100)   |           | not null |
 category   | character varying(50)    |           | not null |
 price      | integer                  |           | not null |
 created_at | timestamp with time zone |           | not null |
Indexes:
    "product_with_composite_index_pkey" PRIMARY KEY, btree (id)
    "category_price_idx" btree (category, price)
    "product_wit_name_5adbb5_idx" btree (name)

Data Preparation

To evaluate the impact of indexing on query performance, I populated each table with 500,000 records per category. With multiple categories, each table contains a total of 1.5 million records, allowing for a comprehensive comparison of query execution times and the efficiency gains achieved through indexing.

image info

Analyzing Query Performance

Basic Query

Let’s start with a simple query to get objects based on category.

import json
from pprint import pprint

from django.db.models import Q

from product.models import ProductWithoutIndex
from product.models import ProductWithSingleIndex
from product.models import ProductWithCompositeIndex

1. Query Without Index

pprint(
    json.loads(
        ProductWithoutIndex.objects.filter(
            category="electronics").explain(
            analyze=True, verbose=True, format="json"
        )
    )
)

Result:

[{'Execution Time': 125.405,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 500000,
           'Actual Startup Time': 0.281,
           'Actual Total Time': 111.085,
           'Alias': 'product_without_index',
           'Async Capable': False,
           'Filter': '((product_without_index.category)::text = '
                     "'electronics'::text)",
           'Node Type': 'Seq Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 508397,
           'Plan Width': 46,
           'Relation Name': 'product_without_index',
           'Rows Removed by Filter': 1000000,
           'Schema': 'public',
           'Startup Cost': 0.0,
           'Total Cost': 33204.95},
  'Planning Time': 1.064,
  'Triggers': []}]

2. Query With Single Index

pprint(
    json.loads(
        ProductWithSingleIndex.objects.filter(
            category="electronics").explain(
            analyze=True, verbose=True, format="json"
        )
    )
)

Result:

[{'Execution Time': 81.159,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 500000,
           'Actual Startup Time': 25.89,
           'Actual Total Time': 69.833,
           'Alias': 'product_with_single_index',
           'Async Capable': False,
           'Exact Heap Blocks': 4749,
           'Lossy Heap Blocks': 0,
           'Node Type': 'Bitmap Heap Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 519553,
           'Plan Width': 46,
           'Plans': [{'Actual Loops': 1,
                      'Actual Rows': 500000,
                      'Actual Startup Time': 25.037,
                      'Actual Total Time': 25.037,
                      'Async Capable': False,
                      'Index Cond': '((product_with_single_index.category)::text '
                                    "= 'electronics'::text)",
                      'Index Name': 'product_with_single_index_category_715a00f3_like',
                      'Node Type': 'Bitmap Index Scan',
                      'Parallel Aware': False,
                      'Parent Relationship': 'Outer',
                      'Plan Rows': 519553,
                      'Plan Width': 0,
                      'Startup Cost': 0.0,
                      'Total Cost': 5549.07}],
           'Recheck Cond': '((product_with_single_index.category)::text = '
                           "'electronics'::text)",
           'Relation Name': 'product_with_single_index',
           'Rows Removed by Index Recheck': 0,
           'Schema': 'public',
           'Startup Cost': 5678.96,
           'Total Cost': 26617.38},
  'Planning Time': 0.929,
  'Triggers': []}]

3. Query With Composite Index

pprint(
    json.loads(
        ProductWithCompositeIndex.objects.filter(
            category="electronics").explain(
            analyze=True, verbose=True, format="json"
        )
    )
)

Result:

[{'Execution Time': 97.866,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 500000,
           'Actual Startup Time': 48.506,
           'Actual Total Time': 86.292,
           'Alias': 'product_with_composite_index',
           'Async Capable': False,
           'Exact Heap Blocks': 4749,
           'Lossy Heap Blocks': 0,
           'Node Type': 'Bitmap Heap Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 539292,
           'Plan Width': 46,
           'Plans': [{'Actual Loops': 1,
                      'Actual Rows': 500000,
                      'Actual Startup Time': 47.667,
                      'Actual Total Time': 47.668,
                      'Async Capable': False,
                      'Index Cond': '((product_with_composite_index.category)::text '
                                    "= 'electronics'::text)",
                      'Index Name': 'category_price_idx',
                      'Node Type': 'Bitmap Index Scan',
                      'Parallel Aware': False,
                      'Parent Relationship': 'Outer',
                      'Plan Rows': 539292,
                      'Plan Width': 0,
                      'Startup Cost': 0.0,
                      'Total Cost': 6705.12}],
           'Recheck Cond': '((product_with_composite_index.category)::text = '
                           "'electronics'::text)",
           'Relation Name': 'product_with_composite_index',
           'Rows Removed by Index Recheck': 0,
           'Schema': 'public',
           'Startup Cost': 6839.94,
           'Total Cost': 28025.09},
  'Planning Time': 2.499,
  'Triggers': []}]

Filtered Query

Let’s try a query that combines both the category and the price. It’s expected that the composite index should be faster than the single index column while this in turn should be faster when compared to the table with no index.

1. Query Without Index

pprint(
    json.loads(
        ProductWithoutIndex.objects.filter(
            Q(category="electronics")
            & Q(price__lte=1000)
            ).explain(
            analyze=True, verbose=True, format="json"
        )
    )
)

Result:

[{'Execution Time': 132.515,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 500000,
           'Actual Startup Time': 0.171,
           'Actual Total Time': 119.517,
           'Alias': 'product_without_index',
           'Async Capable': False,
           'Filter': '((product_without_index.price <= 1000) AND '
                     '((product_without_index.category)::text = '
                     "'electronics'::text))",
           'Node Type': 'Seq Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 508346,
           'Plan Width': 46,
           'Relation Name': 'product_without_index',
           'Rows Removed by Filter': 1000000,
           'Schema': 'public',
           'Startup Cost': 0.0,
           'Total Cost': 36957.14},
  'Planning Time': 0.936,
  'Triggers': []}]

2. Query With Single Index

pprint(
    json.loads(
        ProductWithSingleIndex.objects.filter(
            Q(category="electronics")
            & Q(price__lte=1000)).explain(
            analyze=True, verbose=True, format="json"
        )
    )
)

Result:

[{'Execution Time': 81.611,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 500000,
           'Actual Startup Time': 24.093,
           'Actual Total Time': 69.407,
           'Alias': 'product_with_single_index',
           'Async Capable': False,
           'Exact Heap Blocks': 4749,
           'Filter': '(product_with_single_index.price <= 1000)',
           'Lossy Heap Blocks': 0,
           'Node Type': 'Bitmap Heap Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 519553,
           'Plan Width': 46,
           'Plans': [{'Actual Loops': 1,
                      'Actual Rows': 500000,
                      'Actual Startup Time': 22.83,
                      'Actual Total Time': 22.834,
                      'Async Capable': False,
                      'Index Cond': '((product_with_single_index.category)::text '
                                    "= 'electronics'::text)",
                      'Index Name': 'product_with_single_index_category_715a00f3_like',
                      'Node Type': 'Bitmap Index Scan',
                      'Parallel Aware': False,
                      'Parent Relationship': 'Outer',
                      'Plan Rows': 519553,
                      'Plan Width': 0,
                      'Startup Cost': 0.0,
                      'Total Cost': 5549.07}],
           'Recheck Cond': '((product_with_single_index.category)::text = '
                           "'electronics'::text)",
           'Relation Name': 'product_with_single_index',
           'Rows Removed by Filter': 0,
           'Rows Removed by Index Recheck': 0,
           'Schema': 'public',
           'Startup Cost': 5678.96,
           'Total Cost': 27916.26},
  'Planning Time': 5.226,
  'Triggers': []}]

3. Query With Composite Index

pprint(
    json.loads(
        ProductWithCompositeIndex.objects.filter(
            Q(category="electronics")
            & Q(price__lte=1000)).explain(
            analyze=True, verbose=True, format="json"
        )
    )
)

Result:

[{'Execution Time': 71.559,
  'Plan': {'Actual Loops': 1,
           'Actual Rows': 500000,
           'Actual Startup Time': 28.275,
           'Actual Total Time': 59.0,
           'Alias': 'product_with_composite_index',
           'Async Capable': False,
           'Exact Heap Blocks': 4749,
           'Lossy Heap Blocks': 0,
           'Node Type': 'Bitmap Heap Scan',
           'Output': ['id', 'name', 'category', 'price', 'created_at'],
           'Parallel Aware': False,
           'Plan Rows': 539239,
           'Plan Width': 46,
           'Plans': [{'Actual Loops': 1,
                      'Actual Rows': 500000,
                      'Actual Startup Time': 27.847,
                      'Actual Total Time': 27.847,
                      'Async Capable': False,
                      'Index Cond': '(((product_with_composite_index.category)::text '
                                    "= 'electronics'::text) AND "
                                    '(product_with_composite_index.price <= '
                                    '1000))',
                      'Index Name': 'category_price_idx',
                      'Node Type': 'Bitmap Index Scan',
                      'Parallel Aware': False,
                      'Parent Relationship': 'Outer',
                      'Plan Rows': 539239,
                      'Plan Width': 0,
                      'Startup Cost': 0.0,
                      'Total Cost': 8052.82}],
           'Recheck Cond': '(((product_with_composite_index.category)::text = '
                           "'electronics'::text) AND "
                           '(product_with_composite_index.price <= 1000))',
           'Relation Name': 'product_with_composite_index',
           'Rows Removed by Index Recheck': 0,
           'Schema': 'public',
           'Startup Cost': 8187.63,
           'Total Cost': 30720.21},
  'Planning Time': 1.063,
  'Triggers': []}]

Query Time Results

Model Type Query Response Time (Expected)
Without Index WHERE category=’electronics’ 125.405 sec
Single Index WHERE category=’electronics’ 81.159 sec
Composite Index WHERE category=’electronics’ 97.866 sec
Without Index WHERE category=’electronics’ AND price >= 1000 132.515 sec
Single Index WHERE category=’electronics’ AND price >= 1000 81.611 sec
Composite Index WHERE category=’electronics’ AND price >= 1000 71.559 sec

Conclusion

Indexing is a powerful optimization technique that significantly enhances query performance in Django applications. By reducing the number of records scanned, indexes help speed up data retrieval, making applications more efficient and responsive. However, indexes come with storage and maintenance costs, so it’s essential to use them strategically. A well-balanced indexing strategy—whether single-column, composite, or full-text indexing—ensures optimal database performance while minimizing overhead. 🚀

Leave a comment