🕒 Cumulative Data Without the Pain: PostgreSQL Rollups with Time Buckets

kush pranjale — Sun, 27 Jul 2025 07:53:21 +0000

Real-time dashboards, financial summaries, and trend graphs — they all need fast aggregation. But when you’re dealing with millions of rows, even a simple SUM or COUNT across a time range can grind your system.

If you’re using PostgreSQL and struggling with this, I’ve got good news:

✅ You don’t need to move to Redshift, BigQuery, or TimescaleDB.
✅ You can roll up and query large time-based data quickly — using PostgreSQL alone.

This blog breaks down how we built cumulative rollups and time-bucketed summaries at scale — without sacrificing flexibility or performance.

🚨 The Problem
Imagine a table like this:

transactions ( id SERIAL, user_id UUID, amount NUMERIC, action_type TEXT, created_at TIMESTAMP, ... )

You want to build a dashboard that shows:

Daily revenue totals
30-minute betting volumes
Weekly unique users
But you can’t scan millions of rows every time a user opens a report.

❌ Real-time aggregate queries become slow and unpredictable
❌ Caching is hard when the data is dynamic
❌ GROUP BY with date truncation doesn't scale well

✅ The Solution: Rollups with Time Buckets

The fix? Pre-compute your aggregates in advance using time buckets.

🧱 Step 1: Create a Rollup Table

Instead of querying raw data, store pre-aggregated values:

CREATE TABLE transaction_rollups ( bucket_start TIMESTAMP, user_id UUID, total_bet NUMERIC, total_win NUMERIC, action_type TEXT, PRIMARY KEY (bucket_start, user_id, action_type) );

Each row represents a time bucket, like 30 min or 1 hour.

🔁 Step 2: Run Periodic Rollup Jobs (Every 5–15 Min)

Use a cron job or background queue (like Bull in Node.js) to run:

INSERT INTO transaction_rollups (bucket_start, user_id, total_bet, total_win, action_type) SELECT date_trunc('30 minutes', created_at) AS bucket, user_id, SUM(CASE WHEN amount_type = 0 THEN amount ELSE 0 END) AS total_bet, SUM(CASE WHEN amount_type = 1 THEN amount ELSE 0 END) AS total_win, action_type FROM transactions WHERE created_at >= now() - interval '1 hour' GROUP BY bucket, user_id, action_type ON CONFLICT (bucket_start, user_id, action_type) DO UPDATE SET total_bet = EXCLUDED.total_bet, total_win = EXCLUDED.total_win;

You can truncate older raw data later or use rollups for long-range analytics.

⚡ Step 3: Querying Is Now Fast & Predictable

Need last 24 hours of betting volume, grouped by 30-minute intervals?

SELECT bucket_start, SUM(total_bet) as sum_bet FROM transaction_rollups WHERE action_type = 'bet' AND bucket_start >= now() - interval '1 day' GROUP BY bucket_start ORDER BY bucket_start;

It’s fast because:

You’re scanning just 48 rows/hour instead of thousands
No joins or heavy computation at runtime

📈 Bonus: Running Cumulative SUMs for Charts

Want a cumulative sum for the graph?

SELECT bucket_start, SUM(SUM(total_bet)) OVER (ORDER BY bucket_start) AS cumulative_bet FROM transaction_rollups WHERE action_type = 'bet' AND bucket_start >= now() - interval '1 day' GROUP BY bucket_start ORDER BY bucket_start;

PostgreSQL’s SUM(...) OVER makes this super smooth.

🧠 Why Not Just Use date_trunc() On Raw Tables?

While date_trunc('hour', created_at) looks simple, it forces PostgreSQL to:

Scan all rows
Compute aggregation on the fly
Often skip indexes due to the function on the timestamp
Rollups shift the work to off-peak time, letting you deliver blazing-fast reports.

🛠 Tooling Tips

Use materialized views if you prefer built-in caching, but be careful with refresh times.
If you need real-time + historical data, combine rollup + latest raw data with a UNION ALL.
Schedule the job every 5–10 mins using corn.

🔚 Final Thoughts

Rollups are a must-have when you’re:

Handling high-traffic applications
Running financial or gaming systems
Building analytics dashboards

With just PostgreSQL, you can:

Scale reporting
Reduce DB load
Keep your users happy

🔥 Tip: If you plan ahead, you can support multiple granularities — hourly, daily, weekly — all from the same strategy.

👋 Let’s Connect

Have questions about PostgreSQL, time-series optimizations, or real-time dashboards?

Let’s talk on LinkedIn or drop a comment.