Database Indexing in Django
Database Indexing
Database indexing is a technique used to optimize the speed of data retrieval operations on a database table. It works similarly to an index in a book, and uses database index to quickly locate data without scanning every row.
What is Database Index?
A database index is a data structure used to improve the speed of data retrieval operations on a database table at the cost of additional storage space and potential performance impacts on write operations (like INSERT, UPDATE, and DELETE). - Database Index
Why Use Indexing in Django?
Django, as a high-level web framework, relies on relational databases like PostgreSQL, MySQL, and SQLite. When querying large tables, without proper indexing, the database has to scan every row to find the matching data, which can be slow. Indexes optimize these lookups by reducing the number of rows scanned, leading to faster queries.
Implementing Indexing in Django
Now, let’s create a three different model, one without index, one with single index, and one with composite index.
1. Model Without Index
ProductWithoutIndex model has no indexing, meaning searches will require a full table scan, leading to slow queries on large datasets.
from django.db import models
class ProductCategory(models.TextChoices):
ELECTRONICS = "electronics", "Electronics"
CLOTHING = "clothing", "Clothing"
HOME_APPLIANCES = "home appliances", "Home Appliances"
class ProductWithoutIndex(models.Model):
name = models.CharField(max_length=100)
category = models.CharField(max_length=50,
choices=ProductCategory.choices)
price = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
db_table = "product_without_index"
2. Model with Single Index
ProductWithSingleIndex model has three single column indexes via db_index=True on the name, category and price columns to speed up searches on frequently queried fields.
class ProductWithSingleIndex(models.Model):
name = models.CharField(max_length=100, db_index=True)
category = models.CharField(max_length=50, db_index=True)
price = models.IntegerField(db_index=True)
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
db_table = "product_with_single_index"
3. Model with Composite Index
ProductWithCompositeIndex model has a composite index on the category and price columns. When queries filter by multiple columns together, a composite (multi-column) index is more efficient.
class ProductWithCompositeIndex(models.Model):
name = models.CharField(max_length=100)
category = models.CharField(max_length=50)
price = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
db_table = "product_with_composite_index"
indexes = [
models.Index(fields=['name']), # Index on 'name'
models.Index(fields=['category', 'price'],
name='category_price_idx'), # Composite index
]
Checking Indexes
After applying migrations, we can verify indexes.
1. ProductWithoutIndex
Table "public.product_without_index"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | character varying(100) | | not null |
category | character varying(50) | | not null |
price | integer | | not null |
created_at | timestamp with time zone | | not null |
Indexes:
"product_without_index_pkey" PRIMARY KEY, btree (id)
2. ProductWithSingleIndex
Table "public.product_with_single_index"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | character varying(100) | | not null |
category | character varying(50) | | not null |
price | integer | | not null |
created_at | timestamp with time zone | | not null |
Indexes:
"product_with_single_index_pkey" PRIMARY KEY, btree (id)
"product_with_single_index_category_715a00f3" btree (category)
"product_with_single_index_category_715a00f3_like" btree (category varchar_pattern_ops)
"product_with_single_index_name_d6fb2180" btree (name)
"product_with_single_index_name_d6fb2180_like" btree (name varchar_pattern_ops)
"product_with_single_index_price_5a707788" btree (price)
3. ProductWithCompositeIndex
Table "public.product_with_composite_index"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | character varying(100) | | not null |
category | character varying(50) | | not null |
price | integer | | not null |
created_at | timestamp with time zone | | not null |
Indexes:
"product_with_composite_index_pkey" PRIMARY KEY, btree (id)
"category_price_idx" btree (category, price)
"product_wit_name_5adbb5_idx" btree (name)
Data Preparation
To evaluate the impact of indexing on query performance, I populated each table with 500,000 records per category. With multiple categories, each table contains a total of 1.5 million records, allowing for a comprehensive comparison of query execution times and the efficiency gains achieved through indexing.
Analyzing Query Performance
Basic Query
Let’s start with a simple query to get objects based on category.
import json
from pprint import pprint
from django.db.models import Q
from product.models import ProductWithoutIndex
from product.models import ProductWithSingleIndex
from product.models import ProductWithCompositeIndex
1. Query Without Index
pprint(
json.loads(
ProductWithoutIndex.objects.filter(
category="electronics").explain(
analyze=True, verbose=True, format="json"
)
)
)
Result:
[{'Execution Time': 125.405,
'Plan': {'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 0.281,
'Actual Total Time': 111.085,
'Alias': 'product_without_index',
'Async Capable': False,
'Filter': '((product_without_index.category)::text = '
"'electronics'::text)",
'Node Type': 'Seq Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 508397,
'Plan Width': 46,
'Relation Name': 'product_without_index',
'Rows Removed by Filter': 1000000,
'Schema': 'public',
'Startup Cost': 0.0,
'Total Cost': 33204.95},
'Planning Time': 1.064,
'Triggers': []}]
2. Query With Single Index
pprint(
json.loads(
ProductWithSingleIndex.objects.filter(
category="electronics").explain(
analyze=True, verbose=True, format="json"
)
)
)
Result:
[{'Execution Time': 81.159,
'Plan': {'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 25.89,
'Actual Total Time': 69.833,
'Alias': 'product_with_single_index',
'Async Capable': False,
'Exact Heap Blocks': 4749,
'Lossy Heap Blocks': 0,
'Node Type': 'Bitmap Heap Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 519553,
'Plan Width': 46,
'Plans': [{'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 25.037,
'Actual Total Time': 25.037,
'Async Capable': False,
'Index Cond': '((product_with_single_index.category)::text '
"= 'electronics'::text)",
'Index Name': 'product_with_single_index_category_715a00f3_like',
'Node Type': 'Bitmap Index Scan',
'Parallel Aware': False,
'Parent Relationship': 'Outer',
'Plan Rows': 519553,
'Plan Width': 0,
'Startup Cost': 0.0,
'Total Cost': 5549.07}],
'Recheck Cond': '((product_with_single_index.category)::text = '
"'electronics'::text)",
'Relation Name': 'product_with_single_index',
'Rows Removed by Index Recheck': 0,
'Schema': 'public',
'Startup Cost': 5678.96,
'Total Cost': 26617.38},
'Planning Time': 0.929,
'Triggers': []}]
3. Query With Composite Index
pprint(
json.loads(
ProductWithCompositeIndex.objects.filter(
category="electronics").explain(
analyze=True, verbose=True, format="json"
)
)
)
Result:
[{'Execution Time': 97.866,
'Plan': {'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 48.506,
'Actual Total Time': 86.292,
'Alias': 'product_with_composite_index',
'Async Capable': False,
'Exact Heap Blocks': 4749,
'Lossy Heap Blocks': 0,
'Node Type': 'Bitmap Heap Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 539292,
'Plan Width': 46,
'Plans': [{'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 47.667,
'Actual Total Time': 47.668,
'Async Capable': False,
'Index Cond': '((product_with_composite_index.category)::text '
"= 'electronics'::text)",
'Index Name': 'category_price_idx',
'Node Type': 'Bitmap Index Scan',
'Parallel Aware': False,
'Parent Relationship': 'Outer',
'Plan Rows': 539292,
'Plan Width': 0,
'Startup Cost': 0.0,
'Total Cost': 6705.12}],
'Recheck Cond': '((product_with_composite_index.category)::text = '
"'electronics'::text)",
'Relation Name': 'product_with_composite_index',
'Rows Removed by Index Recheck': 0,
'Schema': 'public',
'Startup Cost': 6839.94,
'Total Cost': 28025.09},
'Planning Time': 2.499,
'Triggers': []}]
Filtered Query
Let’s try a query that combines both the category and the price. It’s expected that the composite index should be faster than the single index column while this in turn should be faster when compared to the table with no index.
1. Query Without Index
pprint(
json.loads(
ProductWithoutIndex.objects.filter(
Q(category="electronics")
& Q(price__lte=1000)
).explain(
analyze=True, verbose=True, format="json"
)
)
)
Result:
[{'Execution Time': 132.515,
'Plan': {'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 0.171,
'Actual Total Time': 119.517,
'Alias': 'product_without_index',
'Async Capable': False,
'Filter': '((product_without_index.price <= 1000) AND '
'((product_without_index.category)::text = '
"'electronics'::text))",
'Node Type': 'Seq Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 508346,
'Plan Width': 46,
'Relation Name': 'product_without_index',
'Rows Removed by Filter': 1000000,
'Schema': 'public',
'Startup Cost': 0.0,
'Total Cost': 36957.14},
'Planning Time': 0.936,
'Triggers': []}]
2. Query With Single Index
pprint(
json.loads(
ProductWithSingleIndex.objects.filter(
Q(category="electronics")
& Q(price__lte=1000)).explain(
analyze=True, verbose=True, format="json"
)
)
)
Result:
[{'Execution Time': 81.611,
'Plan': {'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 24.093,
'Actual Total Time': 69.407,
'Alias': 'product_with_single_index',
'Async Capable': False,
'Exact Heap Blocks': 4749,
'Filter': '(product_with_single_index.price <= 1000)',
'Lossy Heap Blocks': 0,
'Node Type': 'Bitmap Heap Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 519553,
'Plan Width': 46,
'Plans': [{'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 22.83,
'Actual Total Time': 22.834,
'Async Capable': False,
'Index Cond': '((product_with_single_index.category)::text '
"= 'electronics'::text)",
'Index Name': 'product_with_single_index_category_715a00f3_like',
'Node Type': 'Bitmap Index Scan',
'Parallel Aware': False,
'Parent Relationship': 'Outer',
'Plan Rows': 519553,
'Plan Width': 0,
'Startup Cost': 0.0,
'Total Cost': 5549.07}],
'Recheck Cond': '((product_with_single_index.category)::text = '
"'electronics'::text)",
'Relation Name': 'product_with_single_index',
'Rows Removed by Filter': 0,
'Rows Removed by Index Recheck': 0,
'Schema': 'public',
'Startup Cost': 5678.96,
'Total Cost': 27916.26},
'Planning Time': 5.226,
'Triggers': []}]
3. Query With Composite Index
pprint(
json.loads(
ProductWithCompositeIndex.objects.filter(
Q(category="electronics")
& Q(price__lte=1000)).explain(
analyze=True, verbose=True, format="json"
)
)
)
Result:
[{'Execution Time': 71.559,
'Plan': {'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 28.275,
'Actual Total Time': 59.0,
'Alias': 'product_with_composite_index',
'Async Capable': False,
'Exact Heap Blocks': 4749,
'Lossy Heap Blocks': 0,
'Node Type': 'Bitmap Heap Scan',
'Output': ['id', 'name', 'category', 'price', 'created_at'],
'Parallel Aware': False,
'Plan Rows': 539239,
'Plan Width': 46,
'Plans': [{'Actual Loops': 1,
'Actual Rows': 500000,
'Actual Startup Time': 27.847,
'Actual Total Time': 27.847,
'Async Capable': False,
'Index Cond': '(((product_with_composite_index.category)::text '
"= 'electronics'::text) AND "
'(product_with_composite_index.price <= '
'1000))',
'Index Name': 'category_price_idx',
'Node Type': 'Bitmap Index Scan',
'Parallel Aware': False,
'Parent Relationship': 'Outer',
'Plan Rows': 539239,
'Plan Width': 0,
'Startup Cost': 0.0,
'Total Cost': 8052.82}],
'Recheck Cond': '(((product_with_composite_index.category)::text = '
"'electronics'::text) AND "
'(product_with_composite_index.price <= 1000))',
'Relation Name': 'product_with_composite_index',
'Rows Removed by Index Recheck': 0,
'Schema': 'public',
'Startup Cost': 8187.63,
'Total Cost': 30720.21},
'Planning Time': 1.063,
'Triggers': []}]
Query Time Results
Model Type | Query | Response Time (Expected) |
---|---|---|
Without Index | WHERE category=’electronics’ | 125.405 sec |
Single Index | WHERE category=’electronics’ | 81.159 sec |
Composite Index | WHERE category=’electronics’ | 97.866 sec |
Without Index | WHERE category=’electronics’ AND price >= 1000 | 132.515 sec |
Single Index | WHERE category=’electronics’ AND price >= 1000 | 81.611 sec |
Composite Index | WHERE category=’electronics’ AND price >= 1000 | 71.559 sec |
Conclusion
Indexing is a powerful optimization technique that significantly enhances query performance in Django applications. By reducing the number of records scanned, indexes help speed up data retrieval, making applications more efficient and responsive. However, indexes come with storage and maintenance costs, so it’s essential to use them strategically. A well-balanced indexing strategy—whether single-column, composite, or full-text indexing—ensures optimal database performance while minimizing overhead. 🚀
Leave a comment