Rounding Timestamps to Interval Boundaries in Python and SQL
Problem Statement
When performing time-based data analysis, there's often a need to align timestamps to predefineed interval boundaries. For instance, given a timestamp like 2024-03-26 14:25:59, you might want to group it into 30-minute buckets, resulting in 2024-03-26 14:00:00. This operation is commonly used for aggregation, reporting, and visualization purposes.
Python Implementation
Using pandas, timestamps can be rounded to interval boundaries with a custom function:
import pandas as pd
# Sample dataset
df = pd.DataFrame({
'timestamp': ['2024-03-26 07:00:00', '2024-03-26 07:15:00', '2024-03-26 07:29:59', '2024-03-26 07:50:00'],
'amount': [1, 2, 3, 4]
})
def floor_to_interval(dt, minutes=30):
"""
Floor a timestamp to the start of the specified interval.
Args:
dt: A datetime object
minutes: Interval length in minutes (0-60)
Returns:
Datetime truncated to the interval boundary
"""
return dt.replace(
minute=(dt.minute // minutes) * minutes,
second=0,
microsecond=0
)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['bucket_start'] = df['timestamp'].apply(lambda x: floor_to_interval(x, minutes=30))
The function works by dividing the minute component by the interval length, flooring the result, then multiplying back to get the truncated minute value.
SQL Implementations
ClickHouse
ClickHouse provides the toStartOfInterval function for this purpose:
-- Round to 30-minute boundaries
SELECT toStartOfInterval(event_time, INTERVAL 30 MINUTE) AS interval_start
FROM events_table;
-- Test with a literal timestamp
SELECT toStartOfInterval(toDateTime('2024-02-10 12:13:49'), INTERVAL 30 MINUTE) AS result;
-- Returns: 2024-02-10 12:00:00
The function accepts various interval types including MINUTE, HOUR, DAY, and more.
PostgreSQL
PostgreSQL requires combining multiple functions to achieve the same result:
-- Step 1: Truncate to the hour boundary
SELECT date_trunc('hour', TIMESTAMP '2024-12-12 12:35:31');
-- Returns: 2024-12-12 12:00:00
-- Step 2: Extract the minute component
SELECT EXTRACT(MINUTE FROM TIMESTAMP '2024-12-12 12:35:31');
-- Returns: 35
-- Step 3: Calculate how many complete intervals fit
SELECT FLOOR(EXTRACT(MINUTE FROM TIMESTAMP '2024-12-12 12:35:31') / 30)::INTEGER AS interval_count;
-- Returns: 1
-- Combined solution for 30-minute intervals
SELECT
date_trunc('hour', TIMESTAMP '2024-12-12 12:35:31') +
(FLOOR(EXTRACT(MINUTE FROM TIMESTAMP '2024-12-12 12:35:31') / 30) * INTERVAL '30 minutes') AS interval_start;
-- Returns: 2024-12-12 12:30:00
The approach truncates to the hour first, then adds the appropriate number of 30-minute intervals based on the minute value.
Comparison
| Platform | Function/Method | Complexity |
|---|---|---|
| Python (pandas) | Custom function with apply |
Medium |
| ClickHouse | toStartOfInterval |
Low |
| PostgreSQL | date_trunc + extract + arithmetic |
High |