Using the Django Debug Toolbar, I found that the sql statement to count the number of articles published between different years and months is very time-consuming, requiring nearly 2100ms! so I decided to optimize its sql.
From the code, we can know that the old version of the code extracts all the data in the article table. When the amount of data is too small, the SQL statement speed is not a big problem. Once the amount of data is too large, the performance will drop extremely rapidly, because many unnecessary fields are used.
articles = Article.objects.all()
year_month = set() # use set, no repeating elements
for a in articles:
year_month.add((a.pub_date.year, a.pub_date.month)) # Add the year and month of each article to the set as a tuple
counter = {}.fromkeys(year_month, 0) # Initialize the dictionary with tuple as the key
for a in articles:
counter[(a.pub_date.year, a.pub_date.month)] += 1 # Count the number of articles by year and month
year_month_number = [] # Initialization list
for key in counter:
year_month_number.append([key[0], key[1], counter[key]]) # Convert the dictionary into a list of (year, month, number) tuples
year_month_number.sort(reverse=True) # Sort
Use ExtractYear and ExtractMonth in mysql
from django.db.models.functions import ExtractYear, ExtractMonth
from django.db.models import Count
count_dict = Article.objects.annotate(year=ExtractYear('pub_date'), month=ExtractMonth('pub_date')) \
.values('year', 'month').order_by('year', 'month').annotate(count=Count('id'))
"""
count_dict Data sample
<QuerySet [{'year': 2018, 'month': 7, 'count': 3}, {'year': 2019, 'month': 5, 'count': 7}, {'year': 2019, 'month': 6, 'count': 161}]>
"""
year_month_number = [] # Initialization list
for key in count_dict:
year_month_number.append([key['year'], key['month'], key['count']]) # Convert the dictionary into a list of (year, month, number) tuples
year_month_number.sort(reverse=True) # sort
sql:
SELECT EXTRACT(MONTH FROM `article`.`pub_date`) AS `month`,
EXTRACT(YEAR FROM `article`.`pub_date`) AS `year`,
COUNT(`article`.`id`) AS `count`
FROM `article`
GROUP BY EXTRACT(MONTH FROM `article`.`pub_date`),
EXTRACT(YEAR FROM `article`.`pub_date`)
ORDER BY `year` ASC, `month` ASC