<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: பாலாஜி</title>
    <description>The latest articles on DEV Community by பாலாஜி (@sch00lb0y).</description>
    <link>https://dev.to/sch00lb0y</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F55790%2F4c13e1ac-52c6-4399-bf66-0fc79e384566.jpeg</url>
      <title>DEV Community: பாலாஜி</title>
      <link>https://dev.to/sch00lb0y</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sch00lb0y"/>
    <language>en</language>
    <item>
      <title>How to use pg_cron in postgres to do historic aggregation?</title>
      <dc:creator>பாலாஜி</dc:creator>
      <pubDate>Wed, 02 Mar 2022 16:58:55 +0000</pubDate>
      <link>https://dev.to/sch00lb0y/how-to-use-pgcron-in-postgres-to-do-historic-aggregation-5ghe</link>
      <guid>https://dev.to/sch00lb0y/how-to-use-pgcron-in-postgres-to-do-historic-aggregation-5ghe</guid>
      <description>&lt;h2&gt;
  
  
  What is Postgres?
&lt;/h2&gt;

&lt;p&gt;Postgres is a popular sql database that allows end users to insert data into table-like structures and  retrieve it using ANSI-SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Statement
&lt;/h2&gt;

&lt;p&gt;Recently, I ended up with the use case of historic aggregation of numeric columns. &lt;/p&gt;

&lt;p&gt;To explain the problem better, let's say we have a schema for a banking application. The schema contains balance column in the customer table. Now, we want to know sum of all your customer balances day-wise.&lt;/p&gt;

&lt;p&gt;For e.g:  what is the total balance of all your customers yesterday, today and the upcoming days.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpo0lw35zqr8thquc1kcu.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpo0lw35zqr8thquc1kcu.jpeg" alt="Sum of Deposited cash"&gt;&lt;/a&gt;&lt;br&gt;
This historic aggregation, helps us to measure the growth of the deposited money over a period of time. &lt;/p&gt;

&lt;p&gt;The obvious logical step to solve the problem would be having a periodic job to calculate the sum and insert it into a new table. But, setting up the cron job and maintaining a separate data pipeline is too much of a headache. &lt;/p&gt;
&lt;h2&gt;
  
  
  Hack
&lt;/h2&gt;

&lt;p&gt;Luckily, postgres has an extension to schedule cron jobs called &lt;code&gt;pg_cron&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;With the help of  &lt;code&gt;pg_cron&lt;/code&gt;, we can bring down the job of writing the code and deploying it as a service to writing few SQL queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now that we have simplified the job, let's create a new table with the following columns to store our historic aggregations:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;calculated_at&lt;/strong&gt; - timestamp of the calculated metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;metric_name&lt;/strong&gt; - I've added this metric_name column, so that in future, if i want to add more metrics, I can use this table again&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;metric_value&lt;/strong&gt; - aggregated value.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;historic_numeric_aggregation&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;
     &lt;span class="n"&gt;calculated_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;metric_name&lt;/span&gt;   &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="n"&gt;metric_value&lt;/span&gt;  &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The next step after creating the table is to populate the table with the calculated metrics value everyday. To do that, I'm creating a postgres function, which inserts the sum of balances of the customer to the historic_numeric_aggregation table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt;
&lt;span class="k"&gt;OR&lt;/span&gt;
&lt;span class="k"&gt;replace&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;aggregate_customer_balance_sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;historic_numeric_aggregation&lt;/span&gt;
              &lt;span class="p"&gt;(&lt;/span&gt;
                          &lt;span class="n"&gt;calculated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;metric_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;metric_value&lt;/span&gt;
              &lt;span class="p"&gt;)&lt;/span&gt;
              &lt;span class="k"&gt;VALUES&lt;/span&gt;
              &lt;span class="p"&gt;(&lt;/span&gt;
                          &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="s1"&gt;'customer_sum_balance'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="p"&gt;(&lt;/span&gt;
                                 &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                                 &lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
              &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; 
&lt;span class="k"&gt;language&lt;/span&gt; &lt;span class="s1"&gt;'plpgsql'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So far we have created all the building blocks to calculate the metrics. &lt;/p&gt;

&lt;p&gt;Let's schedule the aggregate_customer_balance_sum function to run every day morning at 10'O clock.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'customer_sum_aggregation'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0 10 * * *'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s1"&gt;'select * from aggregate_customer_balance_sum()'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Voila, now we have a cron job to do the historic metrics aggregation. &lt;/p&gt;

&lt;p&gt;Note: In this post, I've not explained how to enable the &lt;code&gt;pg_cron&lt;/code&gt; extenstion. Because, every flavour of postgres has its own way of installing and enabling it. I've followed &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_pg_cron.html" rel="noopener noreferrer"&gt;AWS tutorial&lt;/a&gt;  to enable pg_cron on our RDS cluster.&lt;/p&gt;

&lt;p&gt;By the way, I'm building a centralized access control solution for postgres to protect customer privacy. If you are curious, follow this link to know more &lt;a href="https://github.com/poonai/inspektor" rel="noopener noreferrer"&gt;https://github.com/poonai/inspektor&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>postgres</category>
      <category>devsec</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
