How to group rows by time in PostgreSQL (and in Django)

Sometimes you need to group rows by a time interval, say, one value per minute, or per hour. In this post I will share how you can achieve it with vanilla PostgreSQL and with Django ORM.

When do you need it? Imagine you have a table that stores weather temperature measurements and takes a temperature snapshot every 10 seconds. In one hour it will collect 360 measurements. That's would be a lot if your script runs several months. Okay, now you want to render temperature measurements as data points on a line chart, and you also want your users be able to reduce data set by hours, days and weeks. Here comes this snippet!

Setting up the database

Before we start experimenting with the SQL query let's setup a database table and fill it with a sample data set. I assume you have already created a database.

create table temperatures (
	value real, 
    measured_at timestamp with time zone
)

I omit indices and primary key here just to simplify the example.

So we have a table. Let's fill it with sample temperature measurements.

insert into temperatures 
select 
	random() * 40 + 1, 
    generate_series('2021-05-01 00:00:00'::timestamp, '2021-05-31 23:59:59'::timestamp, '10 second')

This query will create 267840 measurements. It is pretty large data set for our example.

Let's check what we have in the table:

select * from temperatures limit 10

SQL to group by time

Now we want to select only one measurement per hour. We can do this by truncating the date and using the result in the distinct on expression.

select distinct on (key) 
	*, date_trunc('hour', measured_at) as key 
from temperatures 
limit 20;

You see that PostgreSQL returned only one value per hour. That is what we want! In the same way you can group values by week, month or other period supported by the database.

Use Django ORM to group rows by time

We can do the same with Django ORM. Assuming we have this model for our table

from django.db import models

class Temperature(models.Model):
    value = models.FloatField()
    measured_at = models.DateTimeField()

you can write the following DSL to fetch values grouped by the hour (or other period you wish)

from django.db.models.functions import Trunc

Temperature.objects.annotate(key=Trunc('measured_at', 'hour')).distinct('key').order_by('key')

That's it!

Alex Oleshkevich

Alex Oleshkevich

Alex is a full-stack software developer with 15+ years of web development experience. His primary stack is Python, PHP, and JavaScript.
Minsk, Belarus