<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Braeson Nyahera</title>
    <description>The latest articles on DEV Community by Braeson Nyahera (@braeson_nyahera).</description>
    <link>https://dev.to/braeson_nyahera</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2561920%2F3fb1b08c-8ff3-43e3-80a9-e1b1e121d79b.png</url>
      <title>DEV Community: Braeson Nyahera</title>
      <link>https://dev.to/braeson_nyahera</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/braeson_nyahera"/>
    <language>en</language>
    <item>
      <title>Using symbolic links(symlinks) in airflow to manage DAGs across multiple folders.</title>
      <dc:creator>Braeson Nyahera</dc:creator>
      <pubDate>Tue, 23 Jun 2026 11:40:54 +0000</pubDate>
      <link>https://dev.to/braeson_nyahera/using-symbolic-linkssymlinks-in-airflow-to-manage-dags-across-multiple-folders-49ep</link>
      <guid>https://dev.to/braeson_nyahera/using-symbolic-linkssymlinks-in-airflow-to-manage-dags-across-multiple-folders-49ep</guid>
      <description>&lt;p&gt;If you've ever run a dag file in airflow, we know that airflow manages one folder for DAG files i.e &lt;code&gt;/dags_folder&lt;/code&gt;. That begs the question how would it be able to track dags in different folders without copy pasting the file to &lt;code&gt;/dags_folder&lt;/code&gt;. &lt;br&gt;
After being faced with such a situation symbolic links(symlinks) was the simplest solution to my problem. In this post I will walk through what are symlinks and the solution that they bring.&lt;/p&gt;
&lt;h2&gt;
  
  
  Symbolic Links(Symlinks)
&lt;/h2&gt;

&lt;p&gt;Symlinks are pointers in filesystem, they say "The real content lies somewhere else." When a program reads a symlink, the OS silently redirects it to the target. &lt;br&gt;
That property is exactly what is needed for this to work. We can make airflow &lt;code&gt;/dags_folder&lt;/code&gt; to appear to containg different directories inside it without coping the files.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting it up
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Confirming the original dag folder
&lt;/h3&gt;

&lt;p&gt;After making sure apache-airflow is installed you can check the dags_folder location&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow config get-value core dags_folder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Such a response is expected &lt;code&gt;/home/bryson/airflow/dags&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
The contents of the &lt;code&gt;/dags_folder&lt;/code&gt; is:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhzpbu8bmew46zuc72ujf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhzpbu8bmew46zuc72ujf.png" alt="Dags folder list" width="796" height="59"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Defining our secondary directory
&lt;/h3&gt;

&lt;p&gt;Our secondary directory will be located at &lt;code&gt;/home/bryson/dev/pipeline/dags&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyp3i01i1rjoasu8hte4h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyp3i01i1rjoasu8hte4h.png" alt="Secondary directory" width="796" height="59"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Current airflow api-server
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0x41trv7isiicoirdxl8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F0x41trv7isiicoirdxl8.png" alt="Api server" width="800" height="213"&gt;&lt;/a&gt;&lt;br&gt;
Only the files inside the &lt;code&gt;airflow/dags&lt;/code&gt; are currently displayed in the api-server dashboard.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Creating symlink for the secondary folder
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ln -s source_path target_path&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /home/bryson/dev/pipeline/dags /home/bryson/airflow/dags/fx_prices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will create a link from &lt;code&gt;/home/bryson/airflow/dags&lt;/code&gt; to &lt;code&gt;/home/bryson/dev/pipeline/dags&lt;/code&gt;&lt;br&gt;
Sample of the symlink created:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwrriszg6u130yqxalnoa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwrriszg6u130yqxalnoa.png" alt="Symlinks " width="800" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Api-server after creating a symlink
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fv96malmc452zwatw5i3t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fv96malmc452zwatw5i3t.png" alt="Api-server after creating symlink" width="799" height="211"&gt;&lt;/a&gt;&lt;br&gt;
As you can notice our Api-server now also detects the dag files inside our secondary directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why symlinks
&lt;/h2&gt;

&lt;p&gt;Symlinking is great for local developnment as it's fast to set up and keeps directories independent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Points to note
&lt;/h2&gt;

&lt;p&gt;Sometimes not all files in a folder are dags, to ensure only dags are detected we use &lt;code&gt;.airflowignore&lt;/code&gt; to have the names of the files to be ignored.&lt;br&gt;
This helps only to manage different folders containing dag files from the same api-server dashboard&lt;/p&gt;

</description>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Braeson Nyahera</dc:creator>
      <pubDate>Mon, 22 Jun 2026 10:32:21 +0000</pubDate>
      <link>https://dev.to/braeson_nyahera/-1aij</link>
      <guid>https://dev.to/braeson_nyahera/-1aij</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/joseph_mwangi_3ae1f57a132/pandas-for-data-cleaning-a-practical-guide-for-beginners-520l" class="crayons-story__hidden-navigation-link"&gt;Pandas for Data Cleaning: A Practical Guide for Beginners&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/joseph_mwangi_3ae1f57a132" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3818272%2Ff1b9576d-7ee4-405b-add3-16290301cf48.png" alt="joseph_mwangi_3ae1f57a132 profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/joseph_mwangi_3ae1f57a132" class="crayons-story__secondary fw-medium m:hidden"&gt;
              joseph mwangi
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                joseph mwangi
                
              
              &lt;div id="story-author-preview-content-3895957" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/joseph_mwangi_3ae1f57a132" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3818272%2Ff1b9576d-7ee4-405b-add3-16290301cf48.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;joseph mwangi&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/joseph_mwangi_3ae1f57a132/pandas-for-data-cleaning-a-practical-guide-for-beginners-520l" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 14&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/joseph_mwangi_3ae1f57a132/pandas-for-data-cleaning-a-practical-guide-for-beginners-520l" id="article-link-3895957"&gt;
          Pandas for Data Cleaning: A Practical Guide for Beginners
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/beginners"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;beginners&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/tutorial"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;tutorial&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/joseph_mwangi_3ae1f57a132/pandas-for-data-cleaning-a-practical-guide-for-beginners-520l" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;5&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/joseph_mwangi_3ae1f57a132/pandas-for-data-cleaning-a-practical-guide-for-beginners-520l#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Incremental vs Delta Extraction and Incremental Load vs Upsert in ETL pipelines</title>
      <dc:creator>Braeson Nyahera</dc:creator>
      <pubDate>Tue, 16 Jun 2026 14:26:03 +0000</pubDate>
      <link>https://dev.to/braeson_nyahera/understanding-etl-pipelines-incremental-vs-delta-extraction-and-incremental-load-vs-upsert-2lf5</link>
      <guid>https://dev.to/braeson_nyahera/understanding-etl-pipelines-incremental-vs-delta-extraction-and-incremental-load-vs-upsert-2lf5</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern systems generate vast amount of data daily if not hourly or every second, whether it is through creating entirely new data or just updating existing data. Engineers work around this by having frameworks on how each of this instances are handled. &lt;br&gt;
Terms such as &lt;b&gt;Incremental extraction, delta extraction,incremental load and upsert&lt;/b&gt; are frequently used in ELT/ETL processes.&lt;br&gt;
Although these concepts are relatable, misunderstanding them impacts the optimality of the pipeline, hence it is necessary to know each as a unit of it's own.&lt;/p&gt;
&lt;h2&gt;
  
  
  Understanding where they fit
&lt;/h2&gt;

&lt;p&gt;An ETL pipeline consists of three main steps:&lt;/p&gt;

&lt;ol&gt;
    &lt;li&gt;Extract&lt;/li&gt;
    &lt;li&gt;Transform&lt;/li&gt;
    &lt;li&gt;Load&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1ylc3eqcp9elb7z3ka4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1ylc3eqcp9elb7z3ka4.png" alt="Structure displaying the extraction and loading concepts"&gt;&lt;/a&gt;&lt;br&gt;
Incremental extraction and delta extraction are found the extract step whereas Incremental load and Upsert are found in the load step.&lt;/p&gt;
&lt;h2&gt;
  
  
  Incremental Extraction vs Delta Extraction
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Incremental Extraction
&lt;/h3&gt;

&lt;p&gt;Incremental extraction refers to the process of extracting data that has been changed since a certain point in time. This ensures that not all the data is extracted in every instance.&lt;br&gt;
This strategy heavily depends on timestamps to know data which is new to be extracted.&lt;br&gt;
For example: In a table containing 1000000 records, if only 500 records are added then those are the only ones which would be added.&lt;/p&gt;

&lt;p&gt;Sample code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;last_modified&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-06-14 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;h3&gt;
  
  
  Key Limitation
&lt;/h3&gt;

&lt;p&gt;Incremental extraction cannot detect hard deletes. If a record is removed from the source, it leaves no updated timestamp behind, so the deletion is invisible to this strategy. Deleted rows will persist in the destination as stale data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Delta Extraction
&lt;/h3&gt;

&lt;p&gt;Delta extraction works by capturing exact changes which have occurred since the last extraction. This include inserts, updates and deletions.&lt;br&gt;
The term "delta" represents difference between two states of data. It depends on CDC(Change Data Capture) which includes; database transaction logs, triggers, or difference tables, rather than relying on timestamp to capture every insert, update, and delete.&lt;/p&gt;
&lt;h2&gt;
  
  
  Incremental Load vs Upsert
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Incremental load
&lt;/h3&gt;

&lt;p&gt;Incremental load is the process of loading only new or modified records. This will prevent the whole dataset being reloaded every other time making the process less efficient.&lt;br&gt;
If out of 10000 records only 10 were changed or added then only the ones which were changed are the ones to be loaded.&lt;/p&gt;
&lt;h3&gt;
  
  
  Upsert
&lt;/h3&gt;

&lt;p&gt;Upsert works as a combination of two processes, Insert and Update. This comes in when only appending data is not enough. This process firsts check if the record exists, if it does then it is updated but if it does not exist it is loaded.&lt;br&gt;
For upsert to work you need a unique identifier and a logic to check if the record exists.&lt;/p&gt;

&lt;p&gt;Sample code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Jane'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Nairobi'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
 &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EXCLUDED&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EXCLUDED&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;These four concepts, &lt;b&gt;incremental extraction, delta extraction, incremental load, and upsert,&lt;/b&gt; each solve a specific part of the same problem: keeping data systems in sync efficiently. Understanding them individually is what allows an engineer to design pipelines that are correct, performant, and maintainable.&lt;/p&gt;

&lt;p&gt;The choice between them is rarely about which is "better" rather it is about what the data requires. Append-only data with no deletes calls for a simple incremental extract and load. Frequently updated records with hard deletes call for delta and precise upsert logic. Matching the strategy to the data is the mark of a well-designed pipeline.&lt;/p&gt;

</description>
      <category>data</category>
      <category>etl</category>
      <category>airflow</category>
      <category>pipelines</category>
    </item>
    <item>
      <title>Understanding Logic, Reusability and Integrity On SQL ; Procedures, Functions and Transactions.</title>
      <dc:creator>Braeson Nyahera</dc:creator>
      <pubDate>Sun, 03 May 2026 20:13:52 +0000</pubDate>
      <link>https://dev.to/braeson_nyahera/understanding-logic-reusability-and-integrity-on-sql-procedures-functions-and-transactions-f7a</link>
      <guid>https://dev.to/braeson_nyahera/understanding-logic-reusability-and-integrity-on-sql-procedures-functions-and-transactions-f7a</guid>
      <description>&lt;p&gt;SQL is widely known for data querying and manipulation but systems do grow; data becomes larger; processes become repetitive and operations become sensitive. SQL has some features which enables it to be considered a fully fledged programming language. Some of the features which I discuss in this article are procedures, functions and transactions. Each of these concepts serve distinct purposes.&lt;br&gt;
Procedures execute operations, functions return values, and transactions ensure those operations are safe.&lt;/p&gt;

&lt;h2&gt;Stored Procedures&lt;/h2&gt;

&lt;p&gt;These are set of SQL statements stored in the database and executed as a unit which are used to perform tasks such as &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt; etc.&lt;br&gt;
They are triggered by calling them and passing the expected parameters by the procedure.&lt;br&gt;
Here is an example of a procedure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;increase_salary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_dept&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_percent&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;p_percent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_dept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is how a procedure is called:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;increase_salary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'IT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;Functions&lt;/h2&gt;

&lt;p&gt;This is a reusable logic block which can return values and can be used inside queries for data selection. &lt;br&gt;
It can be used with &lt;code&gt;SELECT&lt;/code&gt;, &lt;code&gt;WHERE&lt;/code&gt; and is great for reusability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;get_avg_salary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_dept&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p_dept&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is an example of how the function can be used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;get_avg_salary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;Transactions&lt;/h2&gt;

&lt;p&gt;Transactions are used to group a set of code operations in which if any of it fails then the whole execution is aborted. This ensures that data is only changed when the full code execution structure is successful.&lt;br&gt;
Transactions are best for data safety as they prevent partial updates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it fails:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ROLLBACK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;How They Work Together&lt;/h2&gt;

&lt;p&gt;It will not be complete to conclude the article without showing an example of how these concepts can be used together. &lt;br&gt;
Here is a simple scenario showing the use of the three concepts interdependently.&lt;br&gt;
&lt;b&gt;Function:&lt;/b&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;get_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acc_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;acc_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;b&gt;Procedure:&lt;/b&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;PROCEDURE&lt;/span&gt; &lt;span class="n"&gt;transfer_money&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="n"&gt;get_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt;
        &lt;span class="n"&gt;RAISE&lt;/span&gt; &lt;span class="n"&gt;EXCEPTION&lt;/span&gt; &lt;span class="s1"&gt;'Insufficient funds'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;from_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;to_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;b&gt;Transaction:&lt;/b&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;transfer_money&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;SQL has a wide range of capabilities more than just being a querying language. Having features such as procedures, functions, transactions and many others helps it to be an efficient tool for use on data in whichever way needed. Mastering it comes a long way in helping the analysis of data directly from the database.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>postgres</category>
      <category>data</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Sub-queries vs Window Functions vs Common Table Expressions: Which Should You Use in SQL?</title>
      <dc:creator>Braeson Nyahera</dc:creator>
      <pubDate>Sun, 19 Apr 2026 08:44:49 +0000</pubDate>
      <link>https://dev.to/braeson_nyahera/sub-queries-vs-window-functions-vs-common-table-expressions-which-should-you-use-in-sql-fnj</link>
      <guid>https://dev.to/braeson_nyahera/sub-queries-vs-window-functions-vs-common-table-expressions-which-should-you-use-in-sql-fnj</guid>
      <description>&lt;p&gt;SQL  is the most used tool for data manipulation but what happens when there are some concepts that seem to work towards the same ultimate outputs. As queries become more complex, developers often encounter different techniques that appear to produce similar results—namely sub-queries, common table expressions (CTEs), and window functions.&lt;br&gt;
At first glance, these concepts can seem interchangeable. You might solve the same problem using any of them, which raises a natural question: which one is better?&lt;br&gt;
&lt;strong&gt;The goal is not to choose one, but to understand when each is the right tool.&lt;/strong&gt;&lt;br&gt;
In this article I will discuss about sub-queries, window functions and common table expressions;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-Queries&lt;/strong&gt;&lt;br&gt;
These are queries inside other queries and are executed either before or alongside the outer query. They can return scalar value, row or table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Window functions&lt;/strong&gt;&lt;br&gt;
These are used when you are trying to preserve your rows. Used together with components such as - &lt;code&gt;OVER()&lt;/code&gt;,&lt;code&gt;PARTITION BY&lt;/code&gt;,&lt;code&gt;ORDER BY&lt;/code&gt; and common functions such as - &lt;code&gt;SUM()&lt;/code&gt;,&lt;code&gt;AVG()&lt;/code&gt;,&lt;code&gt;RANK()&lt;/code&gt;,&lt;code&gt;LAG()&lt;/code&gt;,&lt;code&gt;LEAD()&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;employee_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;job_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rnk&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;rnk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Common Table Expressions (CTEs)&lt;/strong&gt;&lt;br&gt;
These are temporary results set that are initialized using -&lt;code&gt;WITH&lt;/code&gt;. They are often used to improve readability and modularity of scripts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;max_salary&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_sal&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;max_salary&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;department&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_sal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All these features of SQL are used to perform advanced data querying and transformations. They can all be tweaked to perform almost similar queries but it all depends on what the final output is expected to be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use what&lt;/strong&gt;&lt;br&gt;
Each of the features has an instance when it is the most convenient over the others.&amp;nbsp;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For instance when it is a simple filtering such as retaining all those above average marks then a sub-query is the best option.&lt;/li&gt;
&lt;li&gt;When we are more focused on row level analytics then the window functions come in handy for that task.&lt;/li&gt;
&lt;li&gt;In a situation where it is a multi-step logic and re-usability is a core factor the common table expressions are best suited for use.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Using in collaboration&lt;/strong&gt;&lt;br&gt;
The best thing about this features is that they can be used together for the best solution. This helps in the data transformation where there are steps within it in which either is considered to be superior over the other.&lt;br&gt;
Here is an example where all of them are used together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_salary&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; 
    &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;avg_salary&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best Practices&lt;/strong&gt;&lt;br&gt;
These are some of the recommended practices when working with these SQL features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It is best to avoid deep nesting of queries by using CTEs which help in the structuring of the script for easy readability.&lt;/li&gt;
&lt;li&gt;When using CTEs it is necessary to name them well so that their purpose can be identified easily without having to read all the code snippets manually.&lt;/li&gt;
&lt;li&gt;Window functions are often more efficient than sub queries in large datasets, so where possible it is best to use window functions over sub queries but in instances of small datasets any can be used depending on preference.

&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
These are to be considered more as &lt;strong&gt;complementary tools&lt;/strong&gt; than competing tools. In most advanced queries you will have to use them together or interchangeably to get to the results.&amp;nbsp;&lt;br&gt;
What makes a major difference is how you use them as they can heavily impact the performance. It is best to master all of them and understand the strength and weaknesses of each so that when it get to a point of picking one or more you do it from a point of information.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>data</category>
      <category>postgres</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
