Sometimes you need to group rows by a time interval, say, one value per minute, or per hour. In this post I will share how you can achieve it with vanilla PostgreSQL and with Django ORM.
When do you need it? Imagine you have a table that stores weather temperature measurements and takes a temperature snapshot every 10 seconds. In one hour it will collect 360 measurements. That's would be a lot if your script runs several months. Okay, now you want to render temperature measurements as data points on a line chart, and you also want your users be able to reduce data set by hours, days and weeks. Here comes this snippet!
Setting up the database
Before we start experimenting with the SQL query let's setup a database table and fill it with a sample data set. I assume you have already created a database.
create table temperatures ( value real, measured_at timestamp with time zone )
I omit indices and primary key here just to simplify the example.
So we have a table. Let's fill it with sample temperature measurements.
insert into temperatures select random() * 40 + 1, generate_series('2021-05-01 00:00:00'::timestamp, '2021-05-31 23:59:59'::timestamp, '10 second')
This query will create 267840 measurements. It is pretty large data set for our example.
Let's check what we have in the table:
select * from temperatures limit 10
SQL to group by time
select distinct on (key) *, date_trunc('hour', measured_at) as key from temperatures limit 20;
You see that PostgreSQL returned only one value per hour. That is what we want! In the same way you can group values by week, month or other period supported by the database.
Use Django ORM to group rows by time
We can do the same with Django ORM. Assuming we have this model for our table
from django.db import models class Temperature(models.Model): value = models.FloatField() measured_at = models.DateTimeField()
you can write the following DSL to fetch values grouped by the hour (or other period you wish)
from django.db.models.functions import Trunc Temperature.objects.annotate(key=Trunc('measured_at', 'hour')).distinct('key').order_by('key')