Django counts the number of articles by year and month, optimizes SQL statements and increases the speed by 30 times

created at 07-17-2021 views: 2

problem

Using the Django Debug Toolbar, I found that the sql statement to count the number of articles published between different years and months is very time-consuming, requiring nearly 2100ms! so I decided to optimize its sql. 

original code

From the code, we can know that the old version of the code extracts all the data in the article table. When the amount of data is too small, the SQL statement speed is not a big problem. Once the amount of data is too large, the performance will drop extremely rapidly, because many unnecessary fields are used.

articles = Article.objects.all()
year_month = set()  # use set, no repeating elements
for a in articles:
    year_month.add((a.pub_date.year, a.pub_date.month))  # Add the year and month of each article to the set as a tuple
counter = {}.fromkeys(year_month, 0)  # Initialize the dictionary with tuple as the key

for a in articles:
    counter[(a.pub_date.year, a.pub_date.month)] += 1  # Count the number of articles by year and month
year_month_number = [] # Initialization list
for key in counter:
    year_month_number.append([key[0], key[1], counter[key]])  # Convert the dictionary into a list of (year, month, number) tuples
year_month_number.sort(reverse=True)  # Sort

Optimized code

Use ExtractYear and ExtractMonth in mysql

from django.db.models.functions import ExtractYear, ExtractMonth
from django.db.models import Count

count_dict = Article.objects.annotate(year=ExtractYear('pub_date'), month=ExtractMonth('pub_date')) \
    .values('year', 'month').order_by('year', 'month').annotate(count=Count('id'))
"""
count_dict Data sample
<QuerySet [{'year': 2018, 'month': 7, 'count': 3}, {'year': 2019, 'month': 5, 'count': 7}, {'year': 2019, 'month': 6, 'count': 161}]>

"""
year_month_number = [] # Initialization list
for key in count_dict:
    year_month_number.append([key['year'], key['month'], key['count']])  # Convert the dictionary into a list of (year, month, number) tuples
year_month_number.sort(reverse=True)  # sort

sql:

SELECT EXTRACT(MONTH FROM `article`.`pub_date`) AS `month`,
       EXTRACT(YEAR FROM `article`.`pub_date`) AS `year`,
       COUNT(`article`.`id`) AS `count`
 FROM `article`
 GROUP BY EXTRACT(MONTH FROM `article`.`pub_date`),
       EXTRACT(YEAR FROM `article`.`pub_date`)
 ORDER BY `year` ASC, `month` ASC
Please log in to leave a comment.