<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joan Wambui</title>
    <description>The latest articles on DEV Community by Joan Wambui (@wambuijoan).</description>
    <link>https://dev.to/wambuijoan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3850284%2F48390040-4e1d-4e25-bbce-eca16461d9d3.png</url>
      <title>DEV Community: Joan Wambui</title>
      <link>https://dev.to/wambuijoan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wambuijoan"/>
    <language>en</language>
    <item>
      <title>CTEs, Subqueries, and Query Optimisation in SQL</title>
      <dc:creator>Joan Wambui</dc:creator>
      <pubDate>Tue, 21 Apr 2026 16:54:35 +0000</pubDate>
      <link>https://dev.to/wambuijoan/ctes-subqueries-and-query-optimisation-in-sql-3fin</link>
      <guid>https://dev.to/wambuijoan/ctes-subqueries-and-query-optimisation-in-sql-3fin</guid>
      <description>&lt;p&gt;If you have been writing SQL for a while, you have almost certainly run into a moment where a query starts to feel unwieldy. You need to filter based on an aggregation, or you need to reference a result you just computed, and suddenly your query is nested three levels deep and impossible to read. That is where CTEs and subqueries come in. Both solve the same core problem: referencing intermediate results within a larger query. But they do it differently, and knowing when to reach for each one will make you a sharper, more deliberate SQL writer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Subqueries
&lt;/h2&gt;

&lt;p&gt;A subquery is a query written inside another query. It is enclosed in parentheses and can appear in several places: inside a &lt;code&gt;WHERE&lt;/code&gt; clause, inside a &lt;code&gt;FROM&lt;/code&gt; clause, or inside a &lt;code&gt;SELECT&lt;/code&gt; clause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subquery in a WHERE clause
&lt;/h3&gt;

&lt;p&gt;This is the most common use. You filter the outer query based on a result computed by the inner query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inner query runs first, computes the average price across all products, and hands that single value to the outer query. The outer query then filters using it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subquery in a FROM clause
&lt;/h3&gt;

&lt;p&gt;Here, the inner query acts as a temporary table that the outer query reads from. This is sometimes called a derived table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customer_totals&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the alias &lt;code&gt;customer_totals&lt;/code&gt; after the closing parenthesis. This is required in most databases when you use a subquery in a &lt;code&gt;FROM&lt;/code&gt; clause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correlated subquery
&lt;/h3&gt;

&lt;p&gt;A correlated subquery is one that references a column from the outer query. It re-executes for every row the outer query processes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the inner query cannot run independently. It depends on &lt;code&gt;p.category&lt;/code&gt; from the outer query, so it runs once per row. This is powerful but can be slow on large tables.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Table Expressions (CTEs)
&lt;/h2&gt;

&lt;p&gt;A Common Table Expression, defined with the &lt;code&gt;WITH&lt;/code&gt; keyword, lets you name an intermediate result and reference it by that name later in the same query. Think of it as a named, temporary result set that exists only for the duration of the query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;customer_spending&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customer_spending&lt;/span&gt; &lt;span class="n"&gt;cs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;cs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_spent&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;WITH&lt;/code&gt; block defines &lt;code&gt;customer_spending&lt;/code&gt;. The main query below it then joins against it as if it were a real table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Chaining multiple CTEs
&lt;/h3&gt;

&lt;p&gt;One of the most useful features of CTEs is that you can define several of them in sequence, and each one can reference the ones defined before it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;product_sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quantity_sold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_qty&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;avg_sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_qty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_qty&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;product_sales&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_qty&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;product_sales&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_qty&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;avg_qty&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;avg_sales&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;avg_sales&lt;/code&gt; is built directly on top of &lt;code&gt;product_sales&lt;/code&gt;. This kind of step-by-step construction would be much harder to express clearly with nested subqueries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recursive CTEs
&lt;/h3&gt;

&lt;p&gt;CTEs also support recursion, which is useful for querying hierarchical data such as org charts, file structures, or category trees. The CTE references itself until a termination condition is met.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;RECURSIVE&lt;/span&gt; &lt;span class="n"&gt;employee_hierarchy&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;employee_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;manager_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;

  &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;

  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;level&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
  &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;employee_hierarchy&lt;/span&gt; &lt;span class="n"&gt;eh&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;manager_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employee_hierarchy&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;level&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recursive CTEs are one of the few things CTEs can do that subqueries fundamentally cannot replicate cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  CTEs vs Subqueries: Key Differences
&lt;/h2&gt;

&lt;p&gt;Understanding the difference between the two goes beyond syntax. They reflect a different way of structuring your thinking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Readability
&lt;/h3&gt;

&lt;p&gt;Subqueries, especially nested ones, are read from the inside out. The deeper the nesting, the harder it is to follow. CTEs read from top to bottom, like steps in a recipe. When a query has multiple intermediate stages, CTEs are almost always clearer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reusability within a query
&lt;/h3&gt;

&lt;p&gt;A subquery can only be used in the one place where it is written. If you need the same intermediate result in two different parts of your query, you have to write the subquery twice. A CTE can be referenced multiple times within the same query after it is defined once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;high_value_sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
  &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_high_value_customers&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;high_value_sales&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;combined_revenue&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;high_value_sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance
&lt;/h3&gt;

&lt;p&gt;This is more nuanced. In most modern databases like PostgreSQL, MySQL 8+, and SQL Server, the optimiser treats CTEs and subqueries similarly in many cases. However, some databases materialise CTEs (compute and store the result once), while subqueries may be inlined and optimised as part of the surrounding query. In PostgreSQL prior to version 12, CTEs were always materialised, which could actually make them slower in some situations. From PostgreSQL 12 onward, the optimiser can choose whether to materialise or inline a CTE. The practical takeaway is: do not choose one over the other for performance reasons without testing on your specific database version.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correlated logic
&lt;/h3&gt;

&lt;p&gt;Correlated subqueries are uniquely suited to row-by-row comparisons, like checking whether a value exceeds an average within its own group. CTEs cannot be correlated in the same way because they are evaluated independently of the outer query.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use each
&lt;/h3&gt;

&lt;p&gt;Use a &lt;strong&gt;subquery&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The logic is simple and self-contained&lt;/li&gt;
&lt;li&gt;You need correlated row-by-row comparisons&lt;/li&gt;
&lt;li&gt;The intermediate result is only needed in one place&lt;/li&gt;
&lt;li&gt;You want to keep the query concise for a single-step filter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a &lt;strong&gt;CTE&lt;/strong&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have multiple intermediate steps&lt;/li&gt;
&lt;li&gt;You need to reference the same intermediate result more than once&lt;/li&gt;
&lt;li&gt;You are working with recursive or hierarchical data&lt;/li&gt;
&lt;li&gt;Readability and maintainability matter (which is most of the time)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Optimising SQL Queries
&lt;/h2&gt;

&lt;p&gt;Writing a query that returns the right answer is the first goal. Writing one that does it efficiently is the second. Here are the most impactful methods for optimising SQL queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Use indexes wisely
&lt;/h3&gt;

&lt;p&gt;An index is a data structure that allows the database to find rows much faster than scanning the entire table. Always ensure that columns used frequently in &lt;code&gt;WHERE&lt;/code&gt;, &lt;code&gt;JOIN&lt;/code&gt;, &lt;code&gt;ORDER BY&lt;/code&gt;, and &lt;code&gt;GROUP BY&lt;/code&gt; clauses are indexed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Without an index on sale_date, this scans every row&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2023-06-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Creating an index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_sales_date&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be careful not to over-index. Every index adds overhead to &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt; operations because the index must be updated alongside the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Avoid SELECT *
&lt;/h3&gt;

&lt;p&gt;Selecting all columns forces the database to read and transfer every column, even those you do not need. Always specify only the columns your query requires.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Avoid&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Preferred&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reduces the amount of data read from disk and transferred across the network.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Filter early
&lt;/h3&gt;

&lt;p&gt;The further upstream you apply filters, the less data each subsequent step has to process. In joins, filter the data before joining where possible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Less efficient: joins all sales, then filters&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- More efficient: pre-filter in a subquery or CTE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;sales_2023&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2023&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales_2023&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Avoid functions on indexed columns in WHERE clauses
&lt;/h3&gt;

&lt;p&gt;Wrapping a column in a function in a &lt;code&gt;WHERE&lt;/code&gt; clause prevents the database from using any index on that column. The function has to be applied to every row before filtering can happen.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This cannot use an index on sale_date&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2023&lt;/span&gt;

&lt;span class="c1"&gt;-- This can use a range index on sale_date&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2023-12-31'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Use LIMIT when exploring data
&lt;/h3&gt;

&lt;p&gt;When you are investigating data or developing a query, add &lt;code&gt;LIMIT&lt;/code&gt; to avoid accidentally scanning millions of rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remove it only when you need the full result set.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Prefer EXISTS over IN for large subquery results
&lt;/h3&gt;

&lt;p&gt;When checking whether a related row exists, &lt;code&gt;EXISTS&lt;/code&gt; stops as soon as it finds the first match. &lt;code&gt;IN&lt;/code&gt; collects all results from the subquery first and then checks membership. On large datasets, &lt;code&gt;EXISTS&lt;/code&gt; is typically faster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Using IN&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Using EXISTS (generally faster on large tables)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7. Use EXPLAIN / EXPLAIN ANALYZE
&lt;/h3&gt;

&lt;p&gt;Before and after any optimisation, use your database's query plan tool to understand what is actually happening. In PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output shows whether indexes are being used, how many rows are being scanned, and where the most time is spent. This is the single most reliable way to confirm whether an optimisation is actually helping.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Avoid correlated subqueries on large tables
&lt;/h3&gt;

&lt;p&gt;As noted earlier, a correlated subquery runs once per row in the outer query. On a table with 100,000 rows, that means 100,000 subquery executions. Wherever possible, rewrite correlated subqueries as joins or CTEs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Correlated subquery: runs once per customer row&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Equivalent join: far more efficient&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  9. Use appropriate JOIN types
&lt;/h3&gt;

&lt;p&gt;Inner joins are generally faster than outer joins because they return fewer rows and allow the optimiser more flexibility. Use &lt;code&gt;LEFT JOIN&lt;/code&gt; or &lt;code&gt;RIGHT JOIN&lt;/code&gt; only when you genuinely need to retain unmatched rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Aggregate before joining where possible
&lt;/h3&gt;

&lt;p&gt;If you need to join an aggregated result to another table, aggregate first and then join, rather than joining first and then aggregating. This reduces the number of rows the join has to process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Join first, aggregate after (more rows to process during join)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Aggregate first, then join (fewer rows in the join)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;customer_totals&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customer_totals&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Putting It All Together
&lt;/h2&gt;

&lt;p&gt;CTEs and subqueries are not competing tools; they are complementary. Subqueries are precise and compact for simple, single-use logic. CTEs shine when a query has multiple steps, when the same intermediate result is needed more than once, or when the goal is to write something a colleague can read and understand without guessing. Query optimisation, meanwhile, is not about memorising a checklist; it is about understanding what the database is doing and removing unnecessary work at every stage. The combination of clean, readable structure and deliberate performance thinking is what separates functional SQL from genuinely good SQL.&lt;/p&gt;

</description>
      <category>database</category>
      <category>performance</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>SQL fundamentals: DDL, DML, filtering, and CASE WHEN</title>
      <dc:creator>Joan Wambui</dc:creator>
      <pubDate>Mon, 13 Apr 2026 16:06:42 +0000</pubDate>
      <link>https://dev.to/wambuijoan/sql-fundamentals-ddl-dml-filtering-and-case-when-2dfe</link>
      <guid>https://dev.to/wambuijoan/sql-fundamentals-ddl-dml-filtering-and-case-when-2dfe</guid>
      <description>&lt;h2&gt;
  
  
  What is SQL?
&lt;/h2&gt;

&lt;p&gt;SQL (Structured Query Language) is the standard language for interacting with relational databases. Whether you are storing customer records, tracking exam results, or analysing transactions, SQL is how you talk to the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  The five command categories
&lt;/h2&gt;

&lt;p&gt;SQL commands are grouped into five categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Its Function&lt;/th&gt;
&lt;th&gt;Commands&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DDL (Data Definition Language)&lt;/td&gt;
&lt;td&gt;Defines and modifies the database structure&lt;/td&gt;
&lt;td&gt;CREATE, ALTER, DROP, TRUNCATE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DML (Data Manipulation Language)&lt;/td&gt;
&lt;td&gt;Manages/manipulates data inside tables&lt;/td&gt;
&lt;td&gt;INSERT, UPDATE, DELETE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DQL (Data Query Language)&lt;/td&gt;
&lt;td&gt;Retrieves data from the tables&lt;/td&gt;
&lt;td&gt;SELECT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TCL (Transaction Control Language)&lt;/td&gt;
&lt;td&gt;Manages transactions&lt;/td&gt;
&lt;td&gt;COMMIT, ROLLBACK, SAVEPOINT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DCL (Data Control Language)&lt;/td&gt;
&lt;td&gt;Controls access and permissions&lt;/td&gt;
&lt;td&gt;GRANT, REVOKE&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This article focuses on DDL, DML, and DQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  DDL (Data Definition Language)
&lt;/h2&gt;

&lt;p&gt;DDL creates the schemas, tables and columns. It defines the shape of your data before any data exists. It has four commands, as mentioned in the table: CREATE, ALTER, DROP and TRUNCATE.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CREATE TABLE&lt;/code&gt; sets up a table with its columns, data types, and constraints. &lt;code&gt;ALTER TABLE&lt;/code&gt; modifies an existing table by adding columns, renaming them, or dropping ones that are no longer needed. &lt;code&gt;TRUNCATE TABLE&lt;/code&gt; removes the contents of the table; however, it retains the table structure. &lt;code&gt;DROP TABLE&lt;/code&gt; removes the entire table.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A lightbulb moment:&lt;/strong&gt; &lt;code&gt;CREATE TABLE&lt;/code&gt; wraps its column definitions in parentheses. &lt;code&gt;ALTER TABLE&lt;/code&gt; does not as the instruction follows directly after the table name.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Creation of a student table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt;    &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt;    &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_name&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gender&lt;/span&gt;        &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;date_of_birth&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;class&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;city&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Addition of a phone number column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;students&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;phone_number&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  DML (Data Manipulation Language)
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;INSERT INTO&lt;/code&gt; adds rows to a table. You list the target columns, follow with &lt;code&gt;VALUES&lt;/code&gt;, and can insert multiple rows in one statement by separating them with commas. Always quote dates. Without quotes, PostgreSQL reads &lt;code&gt;2008-03-12&lt;/code&gt; as arithmetic (2008 − 3 − 12 = 1993).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO students (student_id, first_name, gender, date_of_birth, class, city)
VALUES
    (1, 'Amina', 'F', '2008-03-12', 'Form 3', 'Nairobi'),
    (2, 'Brian', 'M', '2007-07-25', 'Form 4', 'Mombasa');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;/p&gt;

&lt;p&gt;&lt;code&gt;UPDATE&lt;/code&gt; modifies existing rows. &lt;code&gt;DELETE&lt;/code&gt; removes them. Both require a &lt;code&gt;WHERE&lt;/code&gt; clause. Without it, every row in the table is affected, not just the one you intended.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Updating a student's current city of residence:
UPDATE students
SET city = 'Nairobi'
WHERE student_id = 5;

-- Deleting an exam result from a student whose id is 9:
DELETE FROM exam_results
WHERE result_id = 9;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
plaintext&lt;/p&gt;


&lt;h2&gt;
  
  
  WHERE clause
&lt;/h2&gt;

&lt;p&gt;SQL has a variety of operators that allow for various filtering conditions. The &lt;code&gt;WHERE&lt;/code&gt; clause, when used with these operators, allows you to target specific rows.&lt;/p&gt;

&lt;p&gt;Some operators include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BETWEEN&lt;/code&gt; filters a range and is inclusive on both ends.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;IN&lt;/code&gt; checks against a list of values. It is cleaner than chaining multiple &lt;code&gt;OR&lt;/code&gt; conditions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LIKE&lt;/code&gt; matches text patterns, where &lt;code&gt;%&lt;/code&gt; represents any sequence of characters.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WHERE marks BETWEEN 50 AND 80
WHERE city IN ('Nairobi', 'Mombasa', 'Kisumu')
WHERE class NOT IN ('Form 1', 'Form 2')
WHERE first_name LIKE 'A%' OR first_name LIKE 'E%'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;&lt;br&gt;
plaintext&lt;/p&gt;


&lt;h2&gt;
  
  
  CASE WHEN
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CASE WHEN&lt;/code&gt; lets you apply conditional logic directly inside a query, generating a new column based on rules you define — without touching the underlying table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Exam results labelling:

SELECT *,
    CASE
        WHEN marks &amp;gt;= 80 THEN 'Distinction'
        WHEN marks &amp;gt;= 60 THEN 'Merit'
        WHEN marks &amp;gt;= 40 THEN 'Pass'
        ELSE 'Fail'
    END AS performance
FROM exam_results;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PostgreSQL evaluates conditions top to bottom and stops at the first match, so order matters. The &lt;code&gt;ELSE&lt;/code&gt; clause covers anything that falls outside your defined conditions. Without it, unmatched rows return &lt;code&gt;NULL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;CASE WHEN&lt;/code&gt;, the result exists only in the query output. The table structure is never changed.&lt;/p&gt;

&lt;p&gt;What makes &lt;code&gt;CASE WHEN&lt;/code&gt; powerful is that it lets you enhance your data at query time without needing extra columns in your schema. For reporting and analysis, that flexibility is very useful.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>database</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How I Published My Power BI Report and Embedded It on a Website</title>
      <dc:creator>Joan Wambui</dc:creator>
      <pubDate>Sun, 05 Apr 2026 22:07:04 +0000</pubDate>
      <link>https://dev.to/wambuijoan/how-i-published-my-power-bi-report-and-embedded-it-on-a-website-22bb</link>
      <guid>https://dev.to/wambuijoan/how-i-published-my-power-bi-report-and-embedded-it-on-a-website-22bb</guid>
      <description>&lt;p&gt;I recently completed a Power BI project, an electronics sales dashboard built from a raw CSV file. Cleaning the data, modeling the tables, writing DAX measures, all of that happened in Power BI Desktop.&lt;/p&gt;

&lt;p&gt;This article covers what happens next: getting that report out of your computer and onto the internet where it can actually be used.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Create a Workspace
&lt;/h2&gt;

&lt;p&gt;Before publishing anything, you need a workspace. In Power BI Service, a workspace is essentially a folder that holds your reports and datasets.&lt;br&gt;
Go to app.powerbi.com and sign in. On the left navigation panel, click Workspaces, then click “New workspace”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx62zj0gz7rdhqx6i4m7b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx62zj0gz7rdhqx6i4m7b.png" alt=" " width="769" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Publish from Power BI Desktop
&lt;/h2&gt;

&lt;p&gt;Go back to Power BI Desktop, press Ctrl + S to save your file, then go to the Home tab in the ribbon and click Publish on the right side.&lt;br&gt;&lt;br&gt;
A dialogue box appears asking where to publish. Select the workspace you just created and click Select.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ftzwv9cticukeeutu0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ftzwv9cticukeeutu0z.png" alt=" " width="690" height="166"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Generate the Embed Code
&lt;/h2&gt;

&lt;p&gt;Now that the report is live in Service, you need the embed code to place it on a website.&lt;br&gt;
With the report open in the browser, click File in the top menu. Hover over Embed report and select Publish to web (public). &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiba534l3ohyn5nyxygpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiba534l3ohyn5nyxygpe.png" alt=" " width="690" height="166"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also embed the report using the other option “Website or portal” copy the iframe  code&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Embed It in a Webpage
&lt;/h2&gt;

&lt;p&gt;Open any HTML file in VS Code or a text editor of choice. Paste the iframe code where you want the report to appear. Save the file and open it in a browser. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizl3054myokwurloeqj4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizl3054myokwurloeqj4.png" alt=" " width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your report is now published and the link is ready to share.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>beginners</category>
      <category>microsoft</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained</title>
      <dc:creator>Joan Wambui</dc:creator>
      <pubDate>Tue, 31 Mar 2026 03:15:27 +0000</pubDate>
      <link>https://dev.to/wambuijoan/understanding-data-modeling-in-power-bi-joins-relationships-and-schemas-explained-42ed</link>
      <guid>https://dev.to/wambuijoan/understanding-data-modeling-in-power-bi-joins-relationships-and-schemas-explained-42ed</guid>
      <description>&lt;h2&gt;
  
  
  What Is Data Modeling?
&lt;/h2&gt;

&lt;p&gt;Data modeling refers to how one organises and connects their data so it can be analysed correctly. In Power BI, it defines how tables relate to each other, how filters move across visuals, and how your measures calculate results. &lt;/p&gt;

&lt;h2&gt;
  
  
  Joins
&lt;/h2&gt;

&lt;p&gt;Joins combine data from two tables based on a shared column. In Power BI, joins happen in Power Query Editor before data enters the model.&lt;br&gt;
&lt;strong&gt;INNER JOIN&lt;/strong&gt; - Returns only rows with a match in both tables. &lt;br&gt;
&lt;strong&gt;LEFT JOIN&lt;/strong&gt; - Returns all rows from the left table only.&lt;br&gt;
&lt;strong&gt;RIGHT JOIN&lt;/strong&gt; - Returns all rows from the right table only.&lt;br&gt;
&lt;strong&gt;FULL OUTER JOIN&lt;/strong&gt; - Returns everything from both tables. Useful for spotting data gaps.&lt;br&gt;
&lt;strong&gt;LEFT ANTI JOIN&lt;/strong&gt; - Returns only rows from the left table with no match on the right. &lt;br&gt;
&lt;strong&gt;RIGHT ANTI JOIN&lt;/strong&gt; - Returns only rows from the right table with no match on the left.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffv4zgp0gqu88tr3z8urh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffv4zgp0gqu88tr3z8urh.png" alt="Joins in Power BI" width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Power BI Relationships
&lt;/h2&gt;

&lt;p&gt;Relationships connect tables inside the model without merging them. They control how filters move between tables when you interact with visuals.&lt;br&gt;
Cardinality defines how rows relate across tables:&lt;br&gt;
• &lt;strong&gt;One-to-Many (1:M)&lt;/strong&gt; - One customer, many orders. The most common type.&lt;br&gt;
• &lt;strong&gt;Many-to-Many (M:M)&lt;/strong&gt; - Requires careful handling, best resolved with a bridge table.&lt;br&gt;
• &lt;strong&gt;One-to-One (1:1)&lt;/strong&gt; - Usually means the tables can be merged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Active vs Inactive relationship&lt;/strong&gt; - Only one active relationship is allowed between two tables. Inactive relationships are triggered using &lt;code&gt;USERELATIONSHIP()&lt;/code&gt; in DAX when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-filter direction&lt;/strong&gt; - Single direction is the default and safest. Bidirectional filters flow both ways but can cause performance issues if overused.&lt;/p&gt;

&lt;h2&gt;
  
  
  Joins vs Relationships
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhezdncrv1p9196pt15m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhezdncrv1p9196pt15m.png" alt="Joins vs Relationships" width="776" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fact Tables vs Dimension Tables
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fact tables&lt;/strong&gt; hold measurable data such as sales and transactions. They are long with many rows and link out to dimension tables via foreign keys.&lt;br&gt;
&lt;strong&gt;Dimension tables&lt;/strong&gt; hold descriptive data such as customer names, product categories, dates. They give context to the numbers in your fact table.&lt;br&gt;
In simple terms, your fact table is the center. Dimension tables surround it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Schemas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Star Schema&lt;/strong&gt; - One fact table connected directly to multiple dimension tables. Fast, clean, and what Power BI is optimised for. This should be used by default.&lt;br&gt;
Snowflake Schema - Dimension tables broken into sub-dimensions. This is more complex and slower in Power BI. It is recommended for one to use only it when source data is already normalised.&lt;br&gt;
&lt;strong&gt;Flat Table&lt;/strong&gt; - Everything in one table. This is simple but it creates redundancy and performance problems at scale.&lt;br&gt;
&lt;strong&gt;Pro Tip:&lt;/strong&gt; &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A clean data model is what makes a Power BI report trustworthy. Start with a star schema, separate your facts from your dimensions, define your relationships carefully, and always build a proper Date table. The model is invisible to the end user, but it is what everything depends on.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>beginners</category>
      <category>database</category>
      <category>microsoft</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How Linux is Used in Real-World Data Engineering</title>
      <dc:creator>Joan Wambui</dc:creator>
      <pubDate>Mon, 30 Mar 2026 02:46:26 +0000</pubDate>
      <link>https://dev.to/wambuijoan/how-linux-is-used-in-real-world-data-engineering-451p</link>
      <guid>https://dev.to/wambuijoan/how-linux-is-used-in-real-world-data-engineering-451p</guid>
      <description>&lt;p&gt;Linux has been such a common word in the tech journey, a mountain of a word for beginners, but a simple and helpful foundation once you grasp it. I had heard it before but couldn't tell you what it actually was. Was it a programming language? A tool? The same thing as Bash or Git? For a while I used all four interchangeably. A quiet kind of confusion, the type you don't realise you have until something breaks and you don't know where to look.&lt;/p&gt;

&lt;p&gt;The moment it clicked wasn't dramatic. It was sitting at a terminal during my data engineering class, connected to a remote server, and realising the environment I was operating in was Linux. Not a language. Not a tool I had installed. The environment itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Is Linux?
&lt;/h2&gt;

&lt;p&gt;Linux is an operating environment, similar to Windows or macOS, that sits beneath your tools, files, terminal sessions, and pipelines. Bash is the language you speak inside it. Git is a version control tool that runs inside it. GitHub is the cloud platform where your Git repositories live. Four different things, four different layers, blurring together because you encounter them all at once through the same black terminal window.&lt;/p&gt;

&lt;p&gt;This matters because data engineering assumes Linux fluency without announcing it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Linux Shows Up in Real Data Engineering Work
&lt;/h2&gt;

&lt;p&gt;When you provision a virtual machine on Azure or AWS, you land in a Linux environment. When your pipeline runs on a scheduler, it is running on a Linux server. When you work with Docker containers, they are built on Linux. It is rarely the thing being discussed, but it is always underneath the thing being discussed.&lt;/p&gt;

&lt;p&gt;In practice, data engineers use Linux to navigate directories where raw data lands and processed outputs go. They write Bash scripts to automate repetitive ingestion tasks (which include ingesting a file, validating its structure, moving it to a staging folder, logging the result) and schedule them with cron to run automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Navigate to where raw data lands&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /data/raw

&lt;span class="c"&gt;# Step 2: Run your ingestion script&lt;/span&gt;
bash ingest.sh

&lt;span class="c"&gt;# Step 3: Schedule it to run automatically every day at 6am&lt;/span&gt;
0 6 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; /scripts/ingest.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They monitor running jobs, check resource usage, and kill processes that hang. When something breaks on a production server, Linux is the environment you are debugging in. Knowing it is not optional.&lt;/p&gt;

&lt;p&gt;Linux will not be the most exciting thing you learn in data engineering. It does not have a sleek interface or a compelling pitch. But it is the ground where real data work happens, on servers, in the cloud, inside containers, underneath every tool you will eventually depend on. The clearer your understanding of what it is, the faster everything else makes sense.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>linux</category>
      <category>dataengineering</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
