<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kush pranjale</title>
    <description>The latest articles on DEV Community by kush pranjale (@kushpranjale).</description>
    <link>https://dev.to/kushpranjale</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3391053%2F989cca6e-10cf-4d69-be13-76d25b3ea317.jpg</url>
      <title>DEV Community: kush pranjale</title>
      <link>https://dev.to/kushpranjale</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kushpranjale"/>
    <language>en</language>
    <item>
      <title>🕒 Cumulative Data Without the Pain: PostgreSQL Rollups with Time Buckets</title>
      <dc:creator>kush pranjale</dc:creator>
      <pubDate>Sun, 27 Jul 2025 07:53:21 +0000</pubDate>
      <link>https://dev.to/kushpranjale/cumulative-data-without-the-pain-postgresql-rollups-with-time-buckets-43b9</link>
      <guid>https://dev.to/kushpranjale/cumulative-data-without-the-pain-postgresql-rollups-with-time-buckets-43b9</guid>
      <description>&lt;p&gt;&lt;strong&gt;Real-time dashboards, financial summaries, and trend graphs — they all need fast aggregation. But when you’re dealing with millions of rows, even a simple SUM or COUNT across a time range can grind your system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re using PostgreSQL and struggling with this, I’ve got good news:&lt;/p&gt;

&lt;p&gt;✅ You don’t need to move to Redshift, BigQuery, or TimescaleDB.&lt;br&gt;
✅ You can roll up and query large time-based data quickly — using PostgreSQL alone.&lt;/p&gt;

&lt;p&gt;This blog breaks down how we built cumulative rollups and time-bucketed summaries at scale — without sacrificing flexibility or performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🚨 The Problem&lt;/strong&gt;&lt;br&gt;
Imagine a table like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;transactions (&lt;br&gt;
  id SERIAL,&lt;br&gt;
  user_id UUID,&lt;br&gt;
  amount NUMERIC,&lt;br&gt;
  action_type TEXT,&lt;br&gt;
  created_at TIMESTAMP,&lt;br&gt;
  ...&lt;br&gt;
)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You want to build a dashboard that shows:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily revenue totals&lt;/li&gt;
&lt;li&gt;30-minute betting volumes&lt;/li&gt;
&lt;li&gt;Weekly unique users&lt;/li&gt;
&lt;li&gt;But you can’t scan millions of rows every time a user opens a report.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ Real-time aggregate queries become slow and unpredictable&lt;br&gt;
❌ Caching is hard when the data is dynamic&lt;br&gt;
❌ GROUP BY with date truncation doesn't scale well&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;✅ The Solution: Rollups with Time Buckets&lt;/strong&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The fix? Pre-compute your aggregates in advance using time buckets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;🧱 Step 1: Create a Rollup Table&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of querying raw data, store pre-aggregated values:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CREATE TABLE transaction_rollups (&lt;br&gt;
  bucket_start TIMESTAMP,&lt;br&gt;
  user_id UUID,&lt;br&gt;
  total_bet NUMERIC,&lt;br&gt;
  total_win NUMERIC,&lt;br&gt;
  action_type TEXT,&lt;br&gt;
  PRIMARY KEY (bucket_start, user_id, action_type)&lt;br&gt;
);&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Each row represents a time bucket, like 30 min or 1 hour.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;🔁 Step 2: Run Periodic Rollup Jobs (Every 5–15 Min)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use a cron job or background queue (like Bull in Node.js) to run:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;INSERT INTO transaction_rollups (bucket_start, user_id, total_bet, total_win,&lt;br&gt;
 action_type)&lt;br&gt;
SELECT&lt;br&gt;
  date_trunc('30 minutes', created_at) AS bucket,&lt;br&gt;
  user_id,&lt;br&gt;
  SUM(CASE WHEN amount_type = 0 THEN amount ELSE 0 END) AS total_bet,&lt;br&gt;
  SUM(CASE WHEN amount_type = 1 THEN amount ELSE 0 END) AS total_win,&lt;br&gt;
  action_type&lt;br&gt;
FROM transactions&lt;br&gt;
WHERE created_at &amp;gt;= now() - interval '1 hour'&lt;br&gt;
GROUP BY bucket, user_id, action_type&lt;br&gt;
ON CONFLICT (bucket_start, user_id, action_type)&lt;br&gt;
DO UPDATE SET&lt;br&gt;
  total_bet = EXCLUDED.total_bet,&lt;br&gt;
  total_win = EXCLUDED.total_win;&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can truncate older raw data later or use rollups for long-range analytics.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;⚡ Step 3: Querying Is Now Fast &amp;amp; Predictable&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Need last 24 hours of betting volume, grouped by 30-minute intervals?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;SELECT&lt;br&gt;
  bucket_start,&lt;br&gt;
  SUM(total_bet) as sum_bet&lt;br&gt;
FROM transaction_rollups&lt;br&gt;
WHERE action_type = 'bet'&lt;br&gt;
  AND bucket_start &amp;gt;= now() - interval '1 day'&lt;br&gt;
GROUP BY bucket_start&lt;br&gt;
ORDER BY bucket_start;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s fast because:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re scanning just 48 rows/hour instead of thousands&lt;/li&gt;
&lt;li&gt;No joins or heavy computation at runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;📈 Bonus: Running Cumulative SUMs for Charts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Want a cumulative sum for the graph?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT&lt;br&gt;
  bucket_start,&lt;br&gt;
  SUM(SUM(total_bet)) OVER (ORDER BY bucket_start) AS cumulative_bet&lt;br&gt;
FROM transaction_rollups&lt;br&gt;
WHERE action_type = 'bet'&lt;br&gt;
  AND bucket_start &amp;gt;= now() - interval '1 day'&lt;br&gt;
GROUP BY bucket_start&lt;br&gt;
ORDER BY bucket_start;&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PostgreSQL’s SUM(...) OVER makes this super smooth.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;🧠 Why Not Just Use date_trunc() On Raw Tables?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;While date_trunc('hour', created_at) looks simple, it forces PostgreSQL to:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan all rows&lt;/li&gt;
&lt;li&gt;Compute aggregation on the fly&lt;/li&gt;
&lt;li&gt;Often skip indexes due to the function on the timestamp&lt;/li&gt;
&lt;li&gt;Rollups shift the work to off-peak time, letting you deliver blazing-fast reports.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;🛠 Tooling Tips&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use materialized views if you prefer built-in caching, but be careful with refresh times.&lt;/li&gt;
&lt;li&gt;If you need real-time + historical data, combine rollup + latest raw data with a UNION ALL.&lt;/li&gt;
&lt;li&gt;Schedule the job every 5–10 mins using corn.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;🔚 Final Thoughts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Rollups are a must-have when you’re:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handling high-traffic applications&lt;/li&gt;
&lt;li&gt;Running financial or gaming systems&lt;/li&gt;
&lt;li&gt;Building analytics dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With just PostgreSQL, you can:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scale reporting&lt;/li&gt;
&lt;li&gt;Reduce DB load&lt;/li&gt;
&lt;li&gt;Keep your users happy&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;🔥 Tip: If you plan ahead, you can support multiple granularities — hourly, daily, weekly — all from the same strategy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;👋 Let’s Connect&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Have questions about PostgreSQL, time-series optimizations, or real-time dashboards?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s talk on &lt;a href="https://www.linkedin.com/in/kush-pranjale-052b3a60" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop a comment.&lt;/p&gt;

</description>
      <category>postgressql</category>
      <category>backend</category>
      <category>performance</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
