<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vivek Kumar</title>
    <description>The latest articles on DEV Community by Vivek Kumar (@vivekdraxlr).</description>
    <link>https://dev.to/vivekdraxlr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3870326%2F6ff71248-2d18-4571-9d3e-9cd0e728f3a2.png</url>
      <title>DEV Community: Vivek Kumar</title>
      <link>https://dev.to/vivekdraxlr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vivekdraxlr"/>
    <language>en</language>
    <item>
      <title>KPI Tracking with SQL: A Practical Starter Kit for SaaS Developers</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Fri, 22 May 2026 04:50:54 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/kpi-tracking-with-sql-a-practical-starter-kit-for-saas-developers-49h3</link>
      <guid>https://dev.to/vivekdraxlr/kpi-tracking-with-sql-a-practical-starter-kit-for-saas-developers-49h3</guid>
      <description>&lt;p&gt;You've shipped the product. Users are signing up. Subscriptions are flowing in. Now your founder, your investors, or your own curiosity is asking: &lt;em&gt;how are we actually doing?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most teams reach for a BI tool or a spreadsheet at this point. But if your data lives in a SQL database—Postgres, MySQL, Redshift, BigQuery—you already have everything you need to answer the most important product and business questions. No third-party analytics event pipeline required.&lt;/p&gt;

&lt;p&gt;This guide gives you practical SQL queries for the KPIs that matter most in a SaaS product: revenue, engagement, retention, and growth. These aren't toy examples—they use realistic table names you'd find in a real product codebase, and they come with the gotchas that trip most developers up the first time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Your Assumed Schema
&lt;/h2&gt;

&lt;p&gt;To keep the queries grounded, here's the database schema we'll work against throughout:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;Key Columns&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;&lt;code&gt;users&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt;, &lt;code&gt;created_at&lt;/code&gt;, &lt;code&gt;plan_id&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;&lt;code&gt;subscriptions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;amount_cents&lt;/code&gt;, &lt;code&gt;billing_cycle&lt;/code&gt;, &lt;code&gt;started_at&lt;/code&gt;, &lt;code&gt;canceled_at&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;&lt;code&gt;payments&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;amount_cents&lt;/code&gt;, &lt;code&gt;paid_at&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;&lt;code&gt;events&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;event_name&lt;/code&gt;, &lt;code&gt;occurred_at&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;&lt;code&gt;trials&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;started_at&lt;/code&gt;, &lt;code&gt;converted_at&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Adapt column names to match your own schema, but the patterns will transfer directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Monthly Recurring Revenue (MRR)
&lt;/h2&gt;

&lt;p&gt;MRR is the single most-watched number in any subscription business. It answers: &lt;em&gt;how much predictable revenue do we have right now?&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mrr&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;billing_cycle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'monthly'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have annual subscribers, normalize their value to monthly by dividing by 12:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="n"&gt;billing_cycle&lt;/span&gt;
      &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="s1"&gt;'monthly'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;amount_cents&lt;/span&gt;
      &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="s1"&gt;'annual'&lt;/span&gt;  &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="n"&gt;amount_cents&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mrr&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Don't double-count. If a user can have multiple subscriptions (e.g. per-seat billing), make sure your &lt;code&gt;subscriptions&lt;/code&gt; table represents the correct unit. Summing at the user level first and then aggregating is safer when your data model is complex.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Churn Rate
&lt;/h2&gt;

&lt;p&gt;Churn tells you how fast you're losing customers. Monthly churn is typically calculated as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Churn Rate = Customers Lost This Month ÷ Customers at Start of Month&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;start_of_month&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;churned&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lost&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'canceled'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;canceled_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;canceled_at&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 month'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lost&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;monthly_churn_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;start_of_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;churned&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Using &lt;code&gt;NULLIF(total, 0)&lt;/code&gt; prevents division-by-zero errors, which will otherwise silently crash your query or return &lt;code&gt;NULL&lt;/code&gt; without warning. A 2–3% monthly churn rate is a common benchmark to aim below; above 5% is a red flag for most B2B SaaS.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Daily Active Users (DAU) and the Stickiness Ratio
&lt;/h2&gt;

&lt;p&gt;DAU tells you engagement. But the ratio of DAU to MAU—your "stickiness ratio"—tells you whether users are building habits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- DAU: today&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dau&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- MAU: last 30 days&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mau&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combine them to get the stickiness ratio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;dau&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;daily&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;mau&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;daily&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;stickiness_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;dau&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mau&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A stickiness ratio above 20% is generally considered healthy. Slack and Facebook sit north of 50%—most B2B tools are happy at 25–40%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Be precise about what counts as an "active" event. Including page loads inflates the number. Stick to intentional actions: creating a record, running a query, submitting a form.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Average Revenue Per User (ARPU)
&lt;/h2&gt;

&lt;p&gt;ARPU helps you understand whether your pricing is working and how different segments compare.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;arpu&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Break it down by plan to see which tier drives the most value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plan_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;avg_revenue&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Trial-to-Paid Conversion Rate
&lt;/h2&gt;

&lt;p&gt;If you have a freemium or free trial model, conversion rate is often more valuable than new signup counts—it's what actually drives revenue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;trial_cohort&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trials&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;converted&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;paid&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trials&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;converted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paid&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;conversion_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;trial_cohort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;converted&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Be careful comparing trials from &lt;em&gt;this&lt;/em&gt; month to conversions from &lt;em&gt;this&lt;/em&gt; month. Many trials take 14–30 days to convert. A better approach is to look at the conversion rate for trials that &lt;em&gt;started&lt;/em&gt; 30–60 days ago, giving them time to complete their journey.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Revenue Growth Rate (Month-over-Month)
&lt;/h2&gt;

&lt;p&gt;Revenue growth tells you if the business is accelerating, stalling, or shrinking. Track it month-over-month.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;paid_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;payments&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'succeeded'&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prev_month_revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mom_growth_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query uses a window function (&lt;code&gt;LAG&lt;/code&gt;) to pull the previous month's value and compute the percentage change. The result looks something like:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;month&lt;/th&gt;
&lt;th&gt;revenue&lt;/th&gt;
&lt;th&gt;prev_month_revenue&lt;/th&gt;
&lt;th&gt;mom_growth_pct&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;2026-05-01&lt;/td&gt;
&lt;td&gt;48,200&lt;/td&gt;
&lt;td&gt;41,500&lt;/td&gt;
&lt;td&gt;16.1%&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2026-04-01&lt;/td&gt;
&lt;td&gt;41,500&lt;/td&gt;
&lt;td&gt;36,800&lt;/td&gt;
&lt;td&gt;12.8%&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2026-03-01&lt;/td&gt;
&lt;td&gt;36,800&lt;/td&gt;
&lt;td&gt;34,100&lt;/td&gt;
&lt;td&gt;7.9%&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  7. Feature Adoption Rate
&lt;/h2&gt;

&lt;p&gt;If you're building a product, you need to know which features people actually use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;event_name&lt;/span&gt;                                                &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;users_used&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
   &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_active_users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;
    &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
       &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;                                                         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;adoption_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;event_name&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'export_report'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'invite_teammate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'create_dashboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'connect_datasource'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_name&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;adoption_pct&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Forgetting time zones.&lt;/strong&gt; &lt;code&gt;CURRENT_DATE&lt;/code&gt; returns a date in your server's time zone. If your users are spread across time zones, you may be cutting off events. Store timestamps in UTC and always convert explicitly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Counting users vs. counting events.&lt;/strong&gt; Many queries should use &lt;code&gt;COUNT(DISTINCT user_id)&lt;/code&gt;, not &lt;code&gt;COUNT(*)&lt;/code&gt;. One user generating 100 events should not count as 100 active users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Including test accounts.&lt;/strong&gt; Seed data, internal team emails, and test subscriptions will skew every metric. Add a &lt;code&gt;WHERE email NOT LIKE '%@yourcompany.com%'&lt;/code&gt; filter or flag test accounts in your users table with an &lt;code&gt;is_test&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Pulling raw SQL into spreadsheets.&lt;/strong&gt; Once your KPI queries are working, don't copy-paste results into a spreadsheet every week. That workflow breaks. Connect a dashboard tool directly to your database to keep numbers fresh and reduce manual error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Ignoring failed payments.&lt;/strong&gt; When calculating MRR from &lt;code&gt;payments&lt;/code&gt;, always filter by &lt;code&gt;status = 'succeeded'&lt;/code&gt;. Failed payments are not revenue, and including them will inflate your numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;SQL is one of the most powerful tools you have for understanding your own product. You don't need a data warehouse or a dedicated analyst to get started—you need a database connection and the right queries.&lt;/p&gt;

&lt;p&gt;Start with these seven metrics: MRR, churn rate, DAU/MAU stickiness, ARPU, trial conversion, revenue growth, and feature adoption. Each one answers a specific question about whether your product is healthy and growing.&lt;/p&gt;

&lt;p&gt;Build them once, pin them to a dashboard you check weekly, and you'll make better product decisions with far less guesswork.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;How do you track KPIs at your company? Do you query your database directly, use a BI tool, or pull from a third-party analytics service? Drop your setup in the comments — I'd love to hear what works for your team.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>analytics</category>
      <category>webdev</category>
    </item>
    <item>
      <title>From English to SQL: How LLMs Actually Understand Your Database Schema</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Thu, 21 May 2026 05:28:35 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/from-english-to-sql-how-llms-actually-understand-your-database-schema-424o</link>
      <guid>https://dev.to/vivekdraxlr/from-english-to-sql-how-llms-actually-understand-your-database-schema-424o</guid>
      <description>&lt;p&gt;You type "show me total revenue by customer for last month" into a chat interface. A second later, out pops a perfectly formatted SQL query against your &lt;code&gt;orders&lt;/code&gt; and &lt;code&gt;customers&lt;/code&gt; tables. It feels like magic.&lt;/p&gt;

&lt;p&gt;But it's not. And once you understand the mechanics — specifically, how the model reads and interprets your database schema — you'll know exactly why text-to-SQL tools sometimes get it right, sometimes get it embarrassingly wrong, and what you can do to tip the odds in your favor.&lt;/p&gt;

&lt;p&gt;This post breaks down the internal mechanics of schema understanding in LLM-powered text-to-SQL systems, with practical advice for developers building or integrating these tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Problem: The LLM Doesn't Know Your Database
&lt;/h2&gt;

&lt;p&gt;Out of the box, an LLM has no idea that your SaaS product has a &lt;code&gt;subscriptions&lt;/code&gt; table, or that &lt;code&gt;mrr&lt;/code&gt; lives in &lt;code&gt;billing_events&lt;/code&gt;, or that you store &lt;code&gt;user_id&lt;/code&gt; in lowercase snake_case everywhere except the one legacy table where it's &lt;code&gt;UserID&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;That knowledge has to be injected into the model's prompt — every single time. The process of selecting &lt;em&gt;which&lt;/em&gt; schema information to include, and how to format it, is called &lt;strong&gt;schema linking&lt;/strong&gt;. It's the most critical (and most commonly fumbled) part of any text-to-SQL pipeline.&lt;/p&gt;

&lt;p&gt;A typical schema injection looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Passed to the LLM in the system or user prompt:&lt;/span&gt;

&lt;span class="k"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PK&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PK&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FK&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;amount_cents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PK&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FK&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;plan_id&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;cancelled_at&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model reads this and builds an internal understanding of your data model — which tables exist, what columns they have, how they relate to each other, and what data types to expect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Schema Parsing — What the Model Actually Sees
&lt;/h2&gt;

&lt;p&gt;When a user asks "which customers churned last month?", the LLM doesn't just search for the word "churned" in your schema. It reasons through what "churned" likely means given the available columns: probably a &lt;code&gt;cancelled_at&lt;/code&gt; timestamp that falls within the last 30 days, joined against &lt;code&gt;users&lt;/code&gt; via &lt;code&gt;subscriptions&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This reasoning — connecting semantic intent in the question to structural elements in the schema — is called &lt;strong&gt;schema linking&lt;/strong&gt;. It involves three sub-problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Table selection:&lt;/strong&gt; Which tables are relevant to this question? For large databases with hundreds of tables, the model must filter down to the handful that matter. Dumping your entire schema into the prompt doesn't scale — a 300-table database easily exceeds a model's effective context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Column selection:&lt;/strong&gt; Within those tables, which columns are needed? This is where naming matters enormously. &lt;code&gt;amt&lt;/code&gt; is ambiguous. &lt;code&gt;amount_cents&lt;/code&gt; is clear. &lt;code&gt;ts&lt;/code&gt; is useless. &lt;code&gt;created_at&lt;/code&gt; is useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Join inference:&lt;/strong&gt; How do the relevant tables connect? The model needs to reconstruct the relationship graph — ideally from explicit foreign key declarations in the schema you provide, rather than guessing from column name patterns alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Why Column Names and Table Names Matter More Than You Think
&lt;/h2&gt;

&lt;p&gt;Here's a test. Give an LLM these two schemas and ask "how many users signed up last week?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema A:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Schema B:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schema B wins every time. Natural language tokens in column names (&lt;code&gt;email&lt;/code&gt;, &lt;code&gt;created_at&lt;/code&gt;, &lt;code&gt;plan&lt;/code&gt;) map cleanly to words a model understands. Abbreviations break that mapping.&lt;/p&gt;

&lt;p&gt;Research from IBM and others has shown that schema-aware models — those given rich metadata including table descriptions, column definitions, and sample values — outperform models handed raw DDL by a significant margin.&lt;/p&gt;

&lt;p&gt;The practical takeaway: &lt;strong&gt;invest in good naming conventions, and pass them into your text-to-SQL pipeline explicitly.&lt;/strong&gt; If you're stuck with legacy short column names, add descriptions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: How to Structure Schema for Maximum LLM Comprehension
&lt;/h2&gt;

&lt;p&gt;Here's an example of a well-formatted schema prompt that a text-to-SQL system might use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a SQL expert. Use the following schema to answer the user's question.

### Schema

Table: users
Description: One row per registered user.
Columns:
  - id (integer, primary key)
  - email (text) — the user's login email
  - created_at (timestamp) — when the account was created
  - plan (text) — current plan: 'free', 'pro', 'enterprise'

Table: orders
Description: Purchases made by users.
Columns:
  - id (integer, primary key)
  - user_id (integer) — foreign key → users.id
  - amount_cents (integer) — order total in cents (divide by 100 for dollars)
  - status (text) — 'pending', 'completed', 'refunded'
  - created_at (timestamp)

### Question
Which users on the 'pro' plan have made more than 3 orders in the past 30 days?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what's included beyond just column names:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Table descriptions&lt;/strong&gt; that explain business purpose&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline comments&lt;/strong&gt; on ambiguous columns (units, enumerations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit foreign key relationships&lt;/strong&gt; spelled out in plain language&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the difference between a mediocre text-to-SQL result and one you'd actually run in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: The Large Schema Problem — When You Can't Fit Everything
&lt;/h2&gt;

&lt;p&gt;Most real-world databases have far more tables than you'd ever want in a single prompt. Cramming 200 table definitions into the context window creates two problems: cost, and accuracy. Models are measurably worse at attending to schema details when surrounded by irrelevant tables.&lt;/p&gt;

&lt;p&gt;The solution is &lt;strong&gt;retrieval-augmented schema selection&lt;/strong&gt;. Before calling the LLM, a retrieval step filters the schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pseudocode for schema-aware RAG in a text-to-SQL pipeline
&lt;/span&gt;
&lt;span class="n"&gt;user_question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;show me monthly revenue by plan for the last 6 months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Embed the user question
&lt;/span&gt;&lt;span class="n"&gt;question_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Search a vector index of table + column descriptions
&lt;/span&gt;&lt;span class="n"&gt;relevant_tables&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;vector_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;schema_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Build a focused schema prompt with only relevant tables
&lt;/span&gt;&lt;span class="n"&gt;schema_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_tables&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Call the LLM with a focused prompt
&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use this schema:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;schema_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Systems like RASL (Retrieval Augmented Schema Linking) from Amazon use this approach to handle databases with thousands of tables, achieving dramatically better accuracy than naive full-schema injection.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;the right schema in the prompt beats a larger model every time.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Foreign Keys Are the Secret Weapon
&lt;/h2&gt;

&lt;p&gt;Joins are where text-to-SQL models fail most visibly. A model might correctly identify that you need both &lt;code&gt;users&lt;/code&gt; and &lt;code&gt;orders&lt;/code&gt;, then produce a broken query because it guessed the wrong join column.&lt;/p&gt;

&lt;p&gt;The fix is explicit: always include foreign key relationships in your schema prompt. Compare these two approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vague:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Table: orders
Columns: id, user_id, amount_cents, created_at
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explicit:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Table: orders
Columns:
  - id (integer, primary key)
  - user_id (integer) — foreign key → users.id; never null
  - amount_cents (integer)
  - created_at (timestamp)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When foreign keys are declared, models reliably produce correct JOINs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Generated correctly when FK is explicit:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without the FK hint, you're relying on the model to guess that &lt;code&gt;user_id&lt;/code&gt; in &lt;code&gt;orders&lt;/code&gt; maps to &lt;code&gt;id&lt;/code&gt; in &lt;code&gt;users&lt;/code&gt;. It usually gets this right — but "usually" isn't good enough for production.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes Developers Make
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Passing raw DDL without descriptions.&lt;/strong&gt; Your CREATE TABLE statements are a start, but they're optimized for the database engine, not for an LLM. Add comments and descriptions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Don't just pass this:&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ev&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;uid&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;typ&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Also include:&lt;/span&gt;
&lt;span class="c1"&gt;-- Table: events (alias: ev)&lt;/span&gt;
&lt;span class="c1"&gt;-- Description: user activity events for product analytics&lt;/span&gt;
&lt;span class="c1"&gt;-- uid → users.id, typ = 'click' | 'page_view' | 'signup' | 'purchase'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Including every table for every query.&lt;/strong&gt; Token-bloat hurts accuracy. Filter schema by relevance before injecting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring enum values.&lt;/strong&gt; If a column like &lt;code&gt;status&lt;/code&gt; has a fixed set of values, list them. The model needs to know that &lt;code&gt;'completed'&lt;/code&gt; is a valid status, not &lt;code&gt;'done'&lt;/code&gt; or &lt;code&gt;'success'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No sample data.&lt;/strong&gt; Even a single example row per table dramatically improves accuracy for edge cases — especially date formats, unit conventions, and enum spellings.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;The gap between a text-to-SQL tool that impresses and one you'd trust in production almost always comes down to schema quality, not model quality. Here's the short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use descriptive table and column names; avoid abbreviations&lt;/li&gt;
&lt;li&gt;Declare foreign key relationships explicitly in your schema prompt&lt;/li&gt;
&lt;li&gt;Include column descriptions for ambiguous fields (units, enums, nullable meaning)&lt;/li&gt;
&lt;li&gt;Filter schema by relevance using vector search for large databases&lt;/li&gt;
&lt;li&gt;Add sample values or enum lists for categorical columns&lt;/li&gt;
&lt;li&gt;Never assume the model will infer what you didn't tell it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM is doing surprisingly sophisticated reasoning — it's matching your English words to your schema structure, inferring joins, and constructing syntactically valid SQL all in one pass. Give it the right schema context, and it'll give you queries you can actually use.&lt;/p&gt;




&lt;p&gt;Have you built a text-to-SQL system or integrated one into your product? I'd love to hear how you handle schema injection — drop a comment below with your approach, especially if you've tackled the large-schema problem in an interesting way.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Prompt AI Tools to Write Accurate SQL Queries (And Why Most Developers Get This Wrong)</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Wed, 20 May 2026 17:03:43 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/how-to-prompt-ai-tools-to-write-accurate-sql-queries-and-why-most-developers-get-this-wrong-55k4</link>
      <guid>https://dev.to/vivekdraxlr/how-to-prompt-ai-tools-to-write-accurate-sql-queries-and-why-most-developers-get-this-wrong-55k4</guid>
      <description>&lt;p&gt;If you've tried asking ChatGPT, Claude, or any AI SQL assistant to generate a query and gotten back something that looked plausible but was subtly wrong — you're not alone. The frustrating part is it often &lt;em&gt;runs&lt;/em&gt;. The database returns rows, the numbers look reasonable, and you ship it. Three days later, someone points out the totals are off by 20%.&lt;/p&gt;

&lt;p&gt;The problem isn't the AI. The problem is the prompt.&lt;/p&gt;

&lt;p&gt;Text-to-SQL works remarkably well when you give the model what it actually needs. According to AWS's benchmarking, GPT-4-class models achieve a &lt;strong&gt;94% first-try success rate&lt;/strong&gt; on ad-hoc analytics queries when the schema and foreign key constraints are properly provided. Without that context? You're closer to 60%. The difference is entirely in how you prompt.&lt;/p&gt;

&lt;p&gt;This guide covers the practical techniques — schema context, few-shot examples, business term definitions, and chain-of-thought decomposition — that separate accurate AI-generated SQL from the kind that silently lies to you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Gets SQL Wrong (It's Not the Model's Fault)
&lt;/h2&gt;

&lt;p&gt;When you ask an AI "give me last month's revenue by plan tier," the model has to make a series of guesses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which table holds revenue? &lt;code&gt;orders&lt;/code&gt;? &lt;code&gt;subscriptions&lt;/code&gt;? &lt;code&gt;invoices&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;What column tracks the amount? &lt;code&gt;total&lt;/code&gt;? &lt;code&gt;amount_cents&lt;/code&gt;? &lt;code&gt;mrr&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;What does "last month" mean in your timezone?&lt;/li&gt;
&lt;li&gt;Is "revenue" recognized revenue, gross, or net of refunds?&lt;/li&gt;
&lt;li&gt;What column stores "plan tier"?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without your schema, the model invents plausible-sounding answers to all of these. It will write syntactically valid SQL that is semantically wrong for your specific database.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; From the model's perspective, the schema &lt;em&gt;is&lt;/em&gt; the problem space. You wouldn't ask a contractor to remodel your kitchen without giving them the floor plan.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Technique 1: Always Include Your Schema (But Not All of It)
&lt;/h2&gt;

&lt;p&gt;The most impactful single change you can make is providing your table definitions in the prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The naive approach&lt;/strong&gt; — dumping your entire database schema — backfires. Enterprise databases with 100+ tables drown the model in noise and push important tables out of its effective context window. The right move is to include only the tables relevant to the question.&lt;/p&gt;

&lt;p&gt;Here's a solid schema context block to include in your prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;You&lt;/span&gt; &lt;span class="k"&gt;are&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt; &lt;span class="n"&gt;expert&lt;/span&gt; &lt;span class="n"&gt;working&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;PostgreSQL&lt;/span&gt; &lt;span class="k"&gt;database&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Relevant&lt;/span&gt; &lt;span class="n"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt;     &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;plan&lt;/span&gt;        &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;-- 'starter', 'pro', 'enterprise'&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;      &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;-- 'active', 'cancelled', 'trialing'&lt;/span&gt;
  &lt;span class="n"&gt;mrr_cents&lt;/span&gt;   &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;-- monthly recurring revenue in cents&lt;/span&gt;
  &lt;span class="n"&gt;started_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;cancelled_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt;       &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;country&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;Write&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="n"&gt;here&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the inline comments on columns like &lt;code&gt;plan&lt;/code&gt; and &lt;code&gt;status&lt;/code&gt;. These are crucial — they tell the model the exact set of values to filter on. Without them, the model might write &lt;code&gt;WHERE plan = 'professional'&lt;/code&gt; instead of &lt;code&gt;WHERE plan = 'pro'&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical tip:&lt;/strong&gt; If you don't know which tables are relevant upfront, ask the model first: &lt;em&gt;"Given this question, which tables from the following list would you need?"&lt;/em&gt; Then provide only those.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technique 2: Define Your Business Terms
&lt;/h2&gt;

&lt;p&gt;Business vocabulary rarely maps cleanly to column names. "Revenue" might mean &lt;code&gt;mrr_cents&lt;/code&gt;. "Active users" might mean users who logged in within 30 days, not users with &lt;code&gt;status = 'active'&lt;/code&gt;. "Churn" could be calculated six different ways.&lt;/p&gt;

&lt;p&gt;Add a &lt;strong&gt;glossary block&lt;/strong&gt; to your prompt for any domain-specific terms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Business term definitions:
- "Active subscriber": a user with subscriptions.status = 'active'
- "MRR": SUM(mrr_cents) / 100.0, expressed in dollars
- "Last month": the full calendar month before the current month
  (e.g., if today is May 15 2026, last month = April 1–30 2026)
- "Churned": status changed from 'active' to 'cancelled' in the period
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the step most developers skip. And it's the reason queries that look right return numbers that don't match your finance team's spreadsheet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technique 3: Use Few-Shot Examples
&lt;/h2&gt;

&lt;p&gt;Few-shot prompting — showing the model one or two example question/query pairs — dramatically improves accuracy on complex queries. It teaches the model your style, your naming conventions, and the kinds of JOINs your schema requires.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Here&lt;/span&gt; &lt;span class="k"&gt;are&lt;/span&gt; &lt;span class="n"&gt;two&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;guide&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt; &lt;span class="k"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="c1"&gt;-- Question: How many new subscribers signed up in March 2026?&lt;/span&gt;
&lt;span class="c1"&gt;-- Query:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Question: What is total MRR by plan for active subscriptions?&lt;/span&gt;
&lt;span class="c1"&gt;-- Query:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mrr_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mrr_dollars&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;mrr_dollars&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Now answer this question: [your new question]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The examples serve as a template. The model learns that you use &lt;code&gt;&amp;gt;=&lt;/code&gt; / &lt;code&gt;&amp;lt;&lt;/code&gt; for date ranges (not &lt;code&gt;BETWEEN&lt;/code&gt;), that you divide cents by 100.0, and that you prefer &lt;code&gt;ORDER BY&lt;/code&gt; descending. You'll get output that fits into your codebase without needing cleanup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technique 4: Break Complex Questions into Steps (Chain-of-Thought)
&lt;/h2&gt;

&lt;p&gt;For multi-step analytical queries — cohort analysis, funnel calculations, retention — don't ask for the whole query at once. Ask the model to reason through it first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;I&lt;/span&gt; &lt;span class="n"&gt;need&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="n"&gt;calculate&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;day&lt;/span&gt; &lt;span class="n"&gt;retention&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;who&lt;/span&gt; &lt;span class="nb"&gt;signed&lt;/span&gt; &lt;span class="n"&gt;up&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;April&lt;/span&gt; &lt;span class="mi"&gt;2026&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Retention&lt;/span&gt; &lt;span class="n"&gt;means&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;they&lt;/span&gt; &lt;span class="n"&gt;had&lt;/span&gt; &lt;span class="k"&gt;at&lt;/span&gt; &lt;span class="n"&gt;least&lt;/span&gt; &lt;span class="n"&gt;one&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="k"&gt;session&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;day&lt;/span&gt; &lt;span class="k"&gt;window&lt;/span&gt;
&lt;span class="k"&gt;after&lt;/span&gt; &lt;span class="n"&gt;signup&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tracked&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="nv"&gt;`events`&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'session_start'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;

&lt;span class="k"&gt;Before&lt;/span&gt; &lt;span class="n"&gt;writing&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;What&lt;/span&gt; &lt;span class="n"&gt;intermediate&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="k"&gt;are&lt;/span&gt; &lt;span class="n"&gt;needed&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;Which&lt;/span&gt; &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="n"&gt;will&lt;/span&gt; &lt;span class="n"&gt;you&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;How&lt;/span&gt; &lt;span class="n"&gt;will&lt;/span&gt; &lt;span class="n"&gt;you&lt;/span&gt; &lt;span class="n"&gt;define&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;day&lt;/span&gt; &lt;span class="k"&gt;window&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;

&lt;span class="k"&gt;Then&lt;/span&gt; &lt;span class="k"&gt;write&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;final&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you ask the model to think through the steps explicitly, it catches its own mistakes before producing the final SQL. This mirrors what an experienced developer does mentally before writing a complex query. The chain-of-thought approach consistently outperforms direct generation on multi-table, multi-condition queries.&lt;/p&gt;

&lt;p&gt;Here's what the output might look like for that retention query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;april_signups&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-01'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'2026-05-01'&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;retained&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;april_signups&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
  &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'session_start'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_signups&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;retained_users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;NUMERIC&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;
    &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;                                           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;retention_pct&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;april_signups&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;retained&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A direct one-shot prompt rarely produces something this structurally sound. Decomposing the problem does.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technique 5: Ask for Explanation and Validation
&lt;/h2&gt;

&lt;p&gt;After receiving a query, always ask the model to explain it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Explain&lt;/span&gt; &lt;span class="n"&gt;what&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="n"&gt;does&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="k"&gt;Are&lt;/span&gt; &lt;span class="n"&gt;there&lt;/span&gt; &lt;span class="k"&gt;any&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="n"&gt;cases&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nulls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;division&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;zero&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;zone&lt;/span&gt; &lt;span class="n"&gt;assumptions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;could&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;incorrect&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This serves two purposes: it helps you catch logical errors before running the query in production, and it forces the model to re-examine its own output. Models often catch their own mistakes during this review pass.&lt;/p&gt;

&lt;p&gt;For example, the model might flag:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"This query uses &lt;code&gt;CURRENT_DATE&lt;/code&gt; which assumes UTC. If your database runs in a different timezone, you may want &lt;code&gt;CURRENT_DATE AT TIME ZONE 'America/New_York'&lt;/code&gt;."&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"If &lt;code&gt;mrr_cents&lt;/code&gt; is NULL for any rows, &lt;code&gt;SUM()&lt;/code&gt; will silently exclude them. You may want &lt;code&gt;COALESCE(mrr_cents, 0)&lt;/code&gt;."&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are exactly the gotchas that cause silent data quality issues.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Vague questions.&lt;/strong&gt; "Show me user activity" will produce a vague, probably useless query. "Show me the count of distinct users who triggered at least one &lt;code&gt;purchase&lt;/code&gt; event in the last 7 days, grouped by their &lt;code&gt;plan&lt;/code&gt;" will produce something accurate and useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No schema, but expecting column-level accuracy.&lt;/strong&gt; The model will guess, and its guesses will be syntactically valid. That's what makes them dangerous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring dialect differences.&lt;/strong&gt; If you're on BigQuery, asking a generic model for date functions might get you &lt;code&gt;DATE_TRUNC&lt;/code&gt; when you need &lt;code&gt;DATE_TRUNC(date, MONTH)&lt;/code&gt;. Always tell the model your database flavor (PostgreSQL, MySQL, BigQuery, Snowflake, etc.).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trusting output that executes without errors.&lt;/strong&gt; A query that runs is not a query that's correct. Always sanity-check with a small sample (&lt;code&gt;LIMIT 100&lt;/code&gt;), compare a few rows to known-good data, and check totals against a source of truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not iterating.&lt;/strong&gt; The first query you get back is a first draft. Paste it back in with corrections: &lt;em&gt;"This is close, but I need the results grouped by week, not month, and I only want users in the 'pro' plan."&lt;/em&gt; Iteration gets you to accuracy faster than trying to write the perfect prompt on the first try.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;The gap between 60% and 94% AI SQL accuracy comes down to what you put into the prompt:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Include your schema&lt;/strong&gt; — just the relevant tables, with column comments for values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define business terms&lt;/strong&gt; — don't let the model guess what "revenue" or "active" means&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add 1–2 example queries&lt;/strong&gt; — establish your style and naming conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decompose complex questions&lt;/strong&gt; — ask for reasoning steps before the final query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask for explanation and edge-case review&lt;/strong&gt; — let the model catch its own mistakes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always specify your database flavor&lt;/strong&gt; — PostgreSQL, MySQL, BigQuery, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AI-powered SQL generation is genuinely useful once you treat prompting as a craft, not an afterthought. The developers who get the most out of these tools aren't the ones using the fanciest models — they're the ones who've learned to give the model what it actually needs to succeed.&lt;/p&gt;




&lt;p&gt;What techniques have worked well for you when prompting AI for SQL? Have you found ways to handle large, complex schemas effectively? Drop your approach in the comments — I'd love to compare notes.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Multi-Tenant SQL Reporting: How to Show Each Customer Only Their Own Data</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Wed, 13 May 2026 18:43:48 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/multi-tenant-sql-reporting-how-to-show-each-customer-only-their-own-data-1762</link>
      <guid>https://dev.to/vivekdraxlr/multi-tenant-sql-reporting-how-to-show-each-customer-only-their-own-data-1762</guid>
      <description>&lt;p&gt;You're three months into your SaaS. Customers want dashboards. The first one is easy — Acme Corp wants to see &lt;em&gt;their&lt;/em&gt; orders, &lt;em&gt;their&lt;/em&gt; signups, &lt;em&gt;their&lt;/em&gt; revenue. You write a quick &lt;code&gt;WHERE tenant_id = 42&lt;/code&gt; and ship it.&lt;/p&gt;

&lt;p&gt;Six months later you have 400 tenants, 30 reports, an AI assistant that writes SQL on the fly, and an engineer who one Tuesday morning forgets the &lt;code&gt;WHERE&lt;/code&gt; clause on a single query. Now Acme Corp can see Globex's data. That's a Monday-morning headline you don't want.&lt;/p&gt;

&lt;p&gt;This article is about how to architect multi-tenant SQL reporting so a single forgotten clause never becomes a breach. We'll cover tenant schema design, Postgres Row-Level Security, how to wire it to JWTs, indexing for performance, and the gotchas that bite people in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three multi-tenant patterns (and why one wins for reporting)
&lt;/h2&gt;

&lt;p&gt;Before any SQL, pick your isolation model. There are three classic options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Isolation&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Database-per-tenant&lt;/td&gt;
&lt;td&gt;Strongest&lt;/td&gt;
&lt;td&gt;Highest (ops, migrations, connections)&lt;/td&gt;
&lt;td&gt;Enterprise, regulated industries&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Schema-per-tenant&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;High (schemas multiply fast)&lt;/td&gt;
&lt;td&gt;Mid-market with &amp;lt;1000 tenants&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Shared schema with tenant_id&lt;/td&gt;
&lt;td&gt;Logical&lt;/td&gt;
&lt;td&gt;Lowest&lt;/td&gt;
&lt;td&gt;Most B2B SaaS&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For 90% of SaaS apps, &lt;strong&gt;shared schema with a &lt;code&gt;tenant_id&lt;/code&gt; column&lt;/strong&gt; is the right call. It's cheap, it scales, and Postgres gives you the tools to make it safe. Everything below assumes this pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Tag every row with a tenant_id
&lt;/h2&gt;

&lt;p&gt;Every table that stores customer data needs a tenant discriminator. No exceptions. Even your &lt;code&gt;events&lt;/code&gt;, &lt;code&gt;audit_logs&lt;/code&gt;, and &lt;code&gt;ai_query_history&lt;/code&gt; tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;           &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tenant_id&lt;/span&gt;    &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;tenants&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;  &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;amount_cents&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;       &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;orders_tenant_created_idx&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to notice: &lt;code&gt;tenant_id&lt;/code&gt; is &lt;code&gt;NOT NULL&lt;/code&gt; (a null tenant means an orphan row that bypasses your filters), and the index leads with &lt;code&gt;tenant_id&lt;/code&gt;. Every reporting query will filter by tenant, so this becomes your bread-and-butter index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Enable Row-Level Security as a safety net
&lt;/h2&gt;

&lt;p&gt;Application-level filtering is fine until someone forgets. Postgres Row-Level Security (RLS) enforces the filter at the database, so even a raw &lt;code&gt;SELECT * FROM orders&lt;/code&gt; returns only the rows the current tenant is allowed to see.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- 1. Turn on RLS&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 2. Define the policy&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;tenant_isolation&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
  &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.tenant_id'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.tenant_id'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;BIGINT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;USING&lt;/code&gt; clause filters rows you can read; the &lt;code&gt;WITH CHECK&lt;/code&gt; clause prevents you from inserting or updating rows into another tenant. Together they cover both directions.&lt;/p&gt;

&lt;p&gt;Now every connection has to set &lt;code&gt;app.tenant_id&lt;/code&gt; before issuing queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;LOCAL&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'42'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- automatically scoped to tenant 42&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SET LOCAL&lt;/code&gt; ties the setting to the current transaction, so it can't leak across requests on a pooled connection — important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Wire the tenant context to your auth layer
&lt;/h2&gt;

&lt;p&gt;The setting has to come from a trusted source. The standard pattern is to extract it from a verified JWT:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pseudocode — runs on every request inside a transaction
&lt;/span&gt;&lt;span class="n"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;public_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SET LOCAL app.tenant_id = %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="c1"&gt;# All queries inside this transaction are now tenant-scoped
&lt;/span&gt;    &lt;span class="nf"&gt;run_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two non-negotiables:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Decode the JWT server-side with a verified signature.&lt;/strong&gt; Never trust a &lt;code&gt;tenant_id&lt;/code&gt; coming from a query parameter or request body.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;SET LOCAL&lt;/code&gt;, not &lt;code&gt;SET&lt;/code&gt;.&lt;/strong&gt; Connection pools recycle connections; &lt;code&gt;SET&lt;/code&gt; persists, &lt;code&gt;SET LOCAL&lt;/code&gt; doesn't.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 4: Reporting queries that "just work"
&lt;/h2&gt;

&lt;p&gt;With RLS on, your reporting queries get simpler — you don't write &lt;code&gt;WHERE tenant_id = ?&lt;/code&gt; in every query. Compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Before RLS — you must remember every time&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- After RLS — same query, scoped automatically&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount_cents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters most for &lt;strong&gt;AI-generated SQL&lt;/strong&gt; and &lt;strong&gt;user-defined reports&lt;/strong&gt;. If an AI assistant writes a query against your warehouse, you don't want safety to depend on whether it remembered the tenant filter. RLS makes the database itself the last line of defense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Handle cross-tenant admin queries explicitly
&lt;/h2&gt;

&lt;p&gt;Eventually you'll want internal queries that span tenants — billing rollups, churn cohorts, support tooling. Don't disable RLS; create a separate admin role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_admin&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;FORCE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;admin_all_access&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
  &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_admin&lt;/span&gt;
  &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FORCE ROW LEVEL SECURITY&lt;/code&gt; makes RLS apply even to the table owner, which prevents your migrations user from accidentally reading everyone's data. The &lt;code&gt;app_admin&lt;/code&gt; role has its own explicit policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Indexing for tenant-scoped queries
&lt;/h2&gt;

&lt;p&gt;A common surprise: after adding RLS, queries that used to be fast get slow. The fix is almost always indexes that lead with &lt;code&gt;tenant_id&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Good — supports the most common reporting pattern&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;orders_tenant_status_created_idx&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Also useful for tenant-scoped joins&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;events_tenant_user_idx&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rule of thumb: any index you'd create on a single-tenant table, prefix with &lt;code&gt;tenant_id&lt;/code&gt; for a multi-tenant one. Postgres can then satisfy both the RLS predicate and your &lt;code&gt;ORDER BY&lt;/code&gt;/filter from the same index.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common gotchas
&lt;/h2&gt;

&lt;p&gt;Here's what bites people in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting to set &lt;code&gt;app.tenant_id&lt;/code&gt;.&lt;/strong&gt; If RLS is on and the setting isn't set, queries return zero rows — not an error. Wrap your transaction setup so missing context throws loudly in development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using &lt;code&gt;SET&lt;/code&gt; instead of &lt;code&gt;SET LOCAL&lt;/code&gt;.&lt;/strong&gt; On a pooled connection, the next request inherits the previous tenant's context. This is a real, observed bug — and a nasty one because the symptom is "wrong customer's data shown."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RLS on the parent table but not on partitions.&lt;/strong&gt; Partitioned tables need RLS enabled on each partition, or queries skip the policy entirely. Easy to miss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-tenant joins.&lt;/strong&gt; A query like &lt;code&gt;SELECT * FROM orders o JOIN users u ON o.user_id = u.id&lt;/code&gt; works fine if both tables have RLS. But if &lt;code&gt;users&lt;/code&gt; doesn't, you can leak data through the join. Enable RLS on every tenant-scoped table — not just the ones you think are sensitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bypassing RLS for views.&lt;/strong&gt; Views run with the privileges of their creator by default. Use &lt;code&gt;CREATE VIEW … WITH (security_invoker = true)&lt;/code&gt; (Postgres 15+) so RLS evaluates against the calling user, not the view owner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No tenant_id on derived tables.&lt;/strong&gt; Materialized views and reporting tables get rebuilt nightly and someone forgets the tenant column. Now your "fast dashboard" returns everyone's data. Always carry tenant_id through.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;p&gt;For multi-tenant SQL reporting, shared schema with &lt;code&gt;tenant_id&lt;/code&gt; is the pragmatic default. Row-Level Security turns that filter from a convention you have to remember into a constraint the database enforces. Pair it with JWT-driven session context, indexes that lead with &lt;code&gt;tenant_id&lt;/code&gt;, and an explicit admin role for cross-tenant queries.&lt;/p&gt;

&lt;p&gt;The payoff is that your reporting layer — whether you build it yourself, embed a third-party dashboard, or let an AI assistant write queries on top of your schema — becomes safe by default. The database itself refuses to leak data, no matter what query lands on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Over to you
&lt;/h2&gt;

&lt;p&gt;How are you handling tenant isolation in your reporting layer today — RLS, application filters, schema-per-tenant, or something else? Have you hit a multi-tenant gotcha that wasn't on this list? Drop it in the comments — I'd love to compare notes.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Don't Trust AI-Generated SQL Blindly: A Developer's Validation Checklist</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Tue, 12 May 2026 09:44:13 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/dont-trust-ai-generated-sql-blindly-a-developers-validation-checklist-5f9g</link>
      <guid>https://dev.to/vivekdraxlr/dont-trust-ai-generated-sql-blindly-a-developers-validation-checklist-5f9g</guid>
      <description>&lt;p&gt;AI coding assistants can knock out a SQL query in seconds. You describe what you want in plain English, and out comes a &lt;code&gt;SELECT&lt;/code&gt; statement that looks completely reasonable. So you copy it, paste it into your app or dashboard query runner, and hit execute.&lt;/p&gt;

&lt;p&gt;This is where things go wrong.&lt;/p&gt;

&lt;p&gt;Unlike a syntax error that crashes loudly, a &lt;em&gt;semantically wrong&lt;/em&gt; SQL query runs successfully and returns a result. The result looks plausible. You ship the dashboard. A week later, someone notices the revenue numbers are 15% off because the AI joined on the wrong foreign key. The silent failure is the dangerous one.&lt;/p&gt;

&lt;p&gt;This post is a practical checklist for validating AI-generated SQL before it touches your production database — whether you're building a reporting feature, an embedded dashboard, or using a text-to-SQL tool internally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI Gets SQL Wrong
&lt;/h2&gt;

&lt;p&gt;Large language models generate SQL by predicting likely token sequences based on training data. They don't &lt;em&gt;execute&lt;/em&gt; the query or check your actual schema — they hallucinate a plausible-looking result. The four failure patterns that show up most often are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Schema hallucination&lt;/strong&gt; — the model invents column names that don't exist, or references the right column from the wrong table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Faulty joins&lt;/strong&gt; — missing &lt;code&gt;ON&lt;/code&gt; clauses (producing accidental Cartesian products), wrong join keys, or incorrect join types (&lt;code&gt;LEFT&lt;/code&gt; vs &lt;code&gt;INNER&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Aggregation mistakes&lt;/strong&gt; — applying &lt;code&gt;COUNT&lt;/code&gt; or &lt;code&gt;SUM&lt;/code&gt; to the wrong column, forgetting &lt;code&gt;GROUP BY&lt;/code&gt; columns, or confusing &lt;code&gt;WHERE&lt;/code&gt; (row-level) with &lt;code&gt;HAVING&lt;/code&gt; (group-level).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Missing filters&lt;/strong&gt; — dropping a &lt;code&gt;WHERE&lt;/code&gt; clause entirely, or missing a tenant/user scoping condition that you absolutely need for correctness and security.&lt;/p&gt;

&lt;p&gt;A 2026 benchmark study found that even top models only achieve ~78% execution accuracy on zero-shot text-to-SQL tasks. That means roughly 1 in 5 queries is wrong — often in ways that won't throw an error.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Validation Checklist
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ 1. Check Every Column Against Your Actual Schema
&lt;/h3&gt;

&lt;p&gt;Before anything else, look up every column name in the generated query against your real schema. Don't assume — AI models frequently confuse similar column names like &lt;code&gt;user_id&lt;/code&gt; vs &lt;code&gt;account_id&lt;/code&gt;, or reference &lt;code&gt;created_at&lt;/code&gt; on a table that uses &lt;code&gt;created_date&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- AI generated this&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Your schema actually uses:&lt;/span&gt;
&lt;span class="c1"&gt;-- users.id (not users.user_id)&lt;/span&gt;
&lt;span class="c1"&gt;-- orders.customer_id (not orders.user_id)&lt;/span&gt;
&lt;span class="c1"&gt;-- orders.amount (not orders.total_amount)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result if run as-is: either a syntax error (best case) or a silent Cartesian product (worst case). The fix is one minute of schema cross-referencing.&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ 2. Trace Every JOIN Condition
&lt;/h3&gt;

&lt;p&gt;For each &lt;code&gt;JOIN&lt;/code&gt;, ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the &lt;code&gt;ON&lt;/code&gt; condition using the actual primary key → foreign key relationship?&lt;/li&gt;
&lt;li&gt;Is the join type (&lt;code&gt;INNER&lt;/code&gt;, &lt;code&gt;LEFT&lt;/code&gt;, &lt;code&gt;RIGHT&lt;/code&gt;) what you actually want?&lt;/li&gt;
&lt;li&gt;Could this produce duplicate rows?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Potentially dangerous: AI used INNER JOIN, dropping users without orders&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Correct if you want ALL users, including those with zero orders:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;INNER JOIN&lt;/code&gt; version silently excludes users with no orders. If you're building a "user activity" report, that's a significant data gap.&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ 3. Verify Aggregation Logic
&lt;/h3&gt;

&lt;p&gt;Check that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GROUP BY&lt;/code&gt; includes all non-aggregated columns in &lt;code&gt;SELECT&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SUM&lt;/code&gt; and &lt;code&gt;COUNT&lt;/code&gt; are applied to the right columns&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;HAVING&lt;/code&gt; is used for filtering aggregated results (not &lt;code&gt;WHERE&lt;/code&gt;)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Bug: filtering on an aggregate in WHERE (will error in most databases)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Correct:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI tools make this exact mistake surprisingly often, especially when you phrase the question as "show me customers who spent more than $1000."&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ 4. Check for Missing Tenant/User Scoping
&lt;/h3&gt;

&lt;p&gt;If you're building a multi-tenant SaaS app or any system where different users should only see their own data, &lt;strong&gt;this is the most critical check&lt;/strong&gt;. AI tools have no idea your &lt;code&gt;events&lt;/code&gt; table contains data from 500 different customers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- AI generates this when asked "show me top pages by view count"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;page_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;views&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;page_events&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;page_url&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- What you actually need in a multi-tenant app:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;page_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;views&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;page_events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;current_account_id&lt;/span&gt;  &lt;span class="c1"&gt;-- ← THIS is what the AI forgot&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;page_url&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the AI version against production would expose every customer's data to every other customer. Always add tenant scoping filters as a mandatory review step.&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ 5. Run Against a Staging Database First
&lt;/h3&gt;

&lt;p&gt;Never run an untested AI-generated query directly on production — especially anything that writes data. For &lt;code&gt;SELECT&lt;/code&gt; queries, use a staging copy of your database and compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Row counts (are they in the expected ballpark?)&lt;/li&gt;
&lt;li&gt;Known values (does the result match what you'd expect for a test account?)&lt;/li&gt;
&lt;li&gt;Edge cases (what happens with NULL values? What about accounts with no data?)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Spot-check: does this query return the right count for a known account?&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- you know this customer has 7 orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the result is &lt;code&gt;7&lt;/code&gt;, you're on the right track. If it's &lt;code&gt;0&lt;/code&gt; or &lt;code&gt;700&lt;/code&gt;, something is wrong with the query logic.&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ 6. Watch Out for NULL Handling
&lt;/h3&gt;

&lt;p&gt;AI-generated queries often miss NULL edge cases. The &lt;code&gt;=&lt;/code&gt; operator doesn't work with NULL — it silently excludes rows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This will miss rows where discount_code IS NULL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;discount_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'PROMO10'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Correct if you want all orders without that specific code, including nulls:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;discount_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'PROMO10'&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;discount_code&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially common in &lt;code&gt;WHERE&lt;/code&gt; filters on optional fields like coupon codes, referral sources, or plan tiers.&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ 7. For Write Operations: Always Dry-Run First
&lt;/h3&gt;

&lt;p&gt;If the AI generates an &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;INSERT&lt;/code&gt;, or &lt;code&gt;DELETE&lt;/code&gt;, wrap it in a transaction and inspect the affected rows before committing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'cancelled'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;trial_ends_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'trialing'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Check what would be affected before committing:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;subscriptions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;trial_ends_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'trialing'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ROLLBACK&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- Don't commit until you're sure&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a preview of the blast radius before any data changes are permanent.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Simple Pre-Run Mental Model
&lt;/h2&gt;

&lt;p&gt;Before executing any AI-generated query, ask yourself:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Question&lt;/th&gt;
      &lt;th&gt;What to check&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Do all columns exist?&lt;/td&gt;
      &lt;td&gt;Cross-reference against schema&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Are all JOINs correct?&lt;/td&gt;
      &lt;td&gt;Verify keys and join types&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Is aggregation right?&lt;/td&gt;
      &lt;td&gt;Check GROUP BY, HAVING vs WHERE&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Is tenant data scoped?&lt;/td&gt;
      &lt;td&gt;Look for account_id / user_id filters&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Are NULLs handled?&lt;/td&gt;
      &lt;td&gt;Check optional column filters&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Is this a write operation?&lt;/td&gt;
      &lt;td&gt;Use BEGIN/ROLLBACK to preview&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Common Mistakes to Watch For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trusting output that "looks right"&lt;/strong&gt; — plausible-looking results are the most dangerous kind of wrong&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skipping staging for read-only queries&lt;/strong&gt; — even SELECTs can cause problems (performance, data leakage) if they're wrong&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not providing schema context to the AI&lt;/strong&gt; — the better your prompt (including table definitions and sample values), the more accurate the output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running AI queries against production without row limits&lt;/strong&gt; — add &lt;code&gt;LIMIT 100&lt;/code&gt; while testing to avoid accidentally returning or processing millions of rows&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;AI tools are genuinely useful for writing SQL faster — especially for repetitive patterns and boilerplate joins. But they're drafting assistants, not a replacement for your understanding of the data model. Every AI-generated query deserves at least a 30-second review against the checklist above.&lt;/p&gt;

&lt;p&gt;The silent failure — a query that runs cleanly but returns wrong data — is what makes SQL validation non-negotiable. A 15% revenue discrepancy discovered a week after shipping is a bad day. A 30-second schema check is not.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Do you have a validation workflow you follow for AI-generated SQL?&lt;/strong&gt; Share it in the comments — I'd love to hear how other teams handle this, especially in multi-tenant or high-stakes reporting contexts.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Beyond Basics: Mastering SQL's LEAD, LAG, RANK, DENSE_RANK, and NTILE</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Mon, 11 May 2026 13:58:12 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/beyond-basics-mastering-sqls-lead-lag-rank-denserank-and-ntile-3i8j</link>
      <guid>https://dev.to/vivekdraxlr/beyond-basics-mastering-sqls-lead-lag-rank-denserank-and-ntile-3i8j</guid>
      <description>&lt;p&gt;You've learned that window functions exist. You've run a &lt;code&gt;ROW_NUMBER()&lt;/code&gt; or two. But there's a whole tier of power hiding just one step further — functions that let you compare rows against their neighbors, assign competition-style rankings, and slice datasets into meaningful percentile buckets. No self-joins. No correlated subqueries. No tears.&lt;/p&gt;

&lt;p&gt;This article goes deep on five advanced window functions: &lt;code&gt;LEAD&lt;/code&gt;, &lt;code&gt;LAG&lt;/code&gt;, &lt;code&gt;RANK&lt;/code&gt;, &lt;code&gt;DENSE_RANK&lt;/code&gt;, and &lt;code&gt;NTILE&lt;/code&gt;. Each one solves a specific class of problem you'll hit constantly in analytics, reporting, and data engineering. We'll use realistic scenarios — e-commerce orders, employee salaries, and product sales — so you can see exactly where these functions shine.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prerequisite:&lt;/strong&gt; This article assumes you already know what the &lt;code&gt;OVER()&lt;/code&gt; clause and &lt;code&gt;PARTITION BY&lt;/code&gt; are. If not, check out the intro to SQL window functions first.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The LEAD and LAG Functions: Looking Forward and Backward
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;LAG&lt;/code&gt; retrieves a value from a &lt;em&gt;previous&lt;/em&gt; row. &lt;code&gt;LEAD&lt;/code&gt; retrieves a value from a &lt;em&gt;future&lt;/em&gt; row. Both are essential for time-series comparisons where you need to know "how did this value change from last period?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;span class="n"&gt;LEAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;offset&lt;/code&gt; — how many rows back (LAG) or forward (LEAD) to look. Defaults to 1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;default_value&lt;/code&gt; — what to return when there's no row at that offset (instead of NULL).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example 1: Month-over-Month Revenue Change
&lt;/h3&gt;

&lt;p&gt;You have a &lt;code&gt;monthly_revenue&lt;/code&gt; table tracking each month's sales:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;month&lt;/span&gt;       &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_id&lt;/span&gt;  &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;revenue&lt;/span&gt;     &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To calculate the change from the previous month for each product:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prev_month_revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;revenue&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue_change&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (sample rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;month&lt;/th&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;revenue&lt;/th&gt;
&lt;th&gt;prev_month_revenue&lt;/th&gt;
&lt;th&gt;revenue_change&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;2026-01-01&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;12000.00&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;12000.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2026-02-01&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;15400.00&lt;/td&gt;
&lt;td&gt;12000.00&lt;/td&gt;
&lt;td&gt;3400.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2026-03-01&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;13800.00&lt;/td&gt;
&lt;td&gt;15400.00&lt;/td&gt;
&lt;td&gt;-1600.00&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice the &lt;code&gt;default_value&lt;/code&gt; of &lt;code&gt;0&lt;/code&gt; prevents the first row from showing &lt;code&gt;NULL&lt;/code&gt;. That's a small touch that makes downstream reporting much cleaner.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Detecting Customer Churn Risk with LEAD
&lt;/h3&gt;

&lt;p&gt;You want to flag customers who placed an order but then didn't order again within 90 days:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;LEAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;next_order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CASE&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;LEAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;
         &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;LEAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;
         &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;
    &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'At Risk'&lt;/span&gt;
    &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'Active'&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;churn_status&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the kind of query that would previously require a self-join on &lt;code&gt;orders o1 JOIN orders o2 ON o1.customer_id = o2.customer_id AND o2.order_date &amp;gt; o1.order_date&lt;/code&gt; — messy, slow, and hard to read.&lt;/p&gt;




&lt;h2&gt;
  
  
  RANK vs DENSE_RANK vs ROW_NUMBER: Choosing the Right Ranking Function
&lt;/h2&gt;

&lt;p&gt;These three functions all assign row numbers, but they handle &lt;em&gt;ties&lt;/em&gt; very differently. Picking the wrong one is a silent bug that poisons your reports.&lt;/p&gt;

&lt;p&gt;Here's the core difference on a single dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;employee_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;row_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;         &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rnk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dense_rnk&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (Engineering department):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;employee_name&lt;/th&gt;
&lt;th&gt;salary&lt;/th&gt;
&lt;th&gt;row_num&lt;/th&gt;
&lt;th&gt;rnk&lt;/th&gt;
&lt;th&gt;dense_rnk&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;120000&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;105000&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Carol&lt;/td&gt;
&lt;td&gt;105000&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Dan&lt;/td&gt;
&lt;td&gt;98000&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ROW_NUMBER&lt;/code&gt; always gives unique numbers — ties are broken arbitrarily. Use this for pagination.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RANK&lt;/code&gt; skips numbers after ties — Bob and Carol are both rank 2, so Dan is rank 4 (not 3). Use this for competition-style rankings ("top 3 scores").&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DENSE_RANK&lt;/code&gt; never skips — Bob and Carol are both rank 2, and Dan is rank 3. Use this when gaps in ranking would be confusing or misleading.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example 3: Top-Earning Employee Per Department
&lt;/h3&gt;

&lt;p&gt;A very common analytics requirement — "show me the top earner in each department":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked_employees&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;employee_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;
      &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;employee_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked_employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why &lt;code&gt;DENSE_RANK&lt;/code&gt; instead of &lt;code&gt;RANK&lt;/code&gt; here? Because if two people tie for top salary, you want &lt;em&gt;both&lt;/em&gt; to appear. &lt;code&gt;RANK = 1&lt;/code&gt; and &lt;code&gt;DENSE_RANK = 1&lt;/code&gt; both return tied rows, but with &lt;code&gt;DENSE_RANK&lt;/code&gt; you avoid the counterintuitive gap at rank 2.&lt;/p&gt;




&lt;h2&gt;
  
  
  NTILE: Slicing Data into Buckets
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;NTILE(n)&lt;/code&gt; divides ordered rows into &lt;code&gt;n&lt;/code&gt; roughly equal groups and assigns each row a bucket number (1 through n). This is the window function for percentile analysis, quartiles, and cohort segmentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;NTILE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 4: Segmenting Customers by Purchase Value
&lt;/h3&gt;

&lt;p&gt;Imagine you want to label customers as Bronze, Silver, Gold, or Platinum based on their total annual spend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;customer_spend&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;annual_spend&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2025&lt;/span&gt;
  &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;customer_tiers&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;annual_spend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;NTILE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;annual_spend&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;spend_quartile&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer_spend&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;annual_spend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="n"&gt;spend_quartile&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Bronze'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Silver'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Gold'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Platinum'&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tier&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customer_tiers&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;annual_spend&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (sample):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;annual_spend&lt;/th&gt;
&lt;th&gt;tier&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;C-4821&lt;/td&gt;
&lt;td&gt;48320.00&lt;/td&gt;
&lt;td&gt;Platinum&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;C-3019&lt;/td&gt;
&lt;td&gt;31005.00&lt;/td&gt;
&lt;td&gt;Gold&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;C-7733&lt;/td&gt;
&lt;td&gt;12400.00&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;C-2210&lt;/td&gt;
&lt;td&gt;1890.00&lt;/td&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One important caveat: if the total number of rows isn't evenly divisible by &lt;code&gt;n&lt;/code&gt;, the higher-numbered buckets get one fewer row. This is mathematically correct but can surprise you if you expect perfectly even groups.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 5: Identifying Performance Outliers with NTILE
&lt;/h3&gt;

&lt;p&gt;In a data pipeline, you want to flag the slowest 25% of queries for investigation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;query_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;execution_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;NTILE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;execution_ms&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;slowness_quartile&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;query_logs&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;logged_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'7 days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filter &lt;code&gt;WHERE slowness_quartile = 1&lt;/code&gt; and you have your slowest 25% — no hardcoded thresholds, no percentile math in application code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes and Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Missing ORDER BY inside OVER()&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;LEAD&lt;/code&gt;, &lt;code&gt;LAG&lt;/code&gt;, &lt;code&gt;RANK&lt;/code&gt;, and &lt;code&gt;DENSE_RANK&lt;/code&gt; are meaningless without an &lt;code&gt;ORDER BY&lt;/code&gt; inside &lt;code&gt;OVER()&lt;/code&gt;. The database will either error out or return unpredictable results. Always specify it explicitly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Confusing the outer ORDER BY with the window ORDER BY&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The &lt;code&gt;ORDER BY&lt;/code&gt; inside &lt;code&gt;OVER(...)&lt;/code&gt; determines the &lt;em&gt;window ordering&lt;/em&gt; — which row is "previous" or "next." The &lt;code&gt;ORDER BY&lt;/code&gt; at the end of the query determines the &lt;em&gt;display order&lt;/em&gt;. These are independent. Your results might look sorted even without an outer &lt;code&gt;ORDER BY&lt;/code&gt;, but never rely on that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. RANK vs DENSE_RANK for filtering&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Using &lt;code&gt;WHERE rank = 2&lt;/code&gt; after &lt;code&gt;RANK()&lt;/code&gt; might return zero rows if rank 2 was skipped due to a tie. &lt;code&gt;DENSE_RANK&lt;/code&gt; avoids this. When filtering by rank in a CTE, think carefully about which behavior you want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. NULLs in LAG/LEAD&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The first row for each partition has no previous row, so &lt;code&gt;LAG&lt;/code&gt; returns &lt;code&gt;NULL&lt;/code&gt; by default. Always provide a sensible default value (the third argument) if downstream calculations can't handle &lt;code&gt;NULL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. NTILE and uneven groups&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;NTILE(3)&lt;/code&gt; on 10 rows gives groups of sizes 4, 3, 3 — not 3.33 each. Document this behavior if you're presenting "even segments" to stakeholders.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;All five of these functions share the same &lt;code&gt;OVER()&lt;/code&gt; structure you already know — what changes is &lt;em&gt;what they compute&lt;/em&gt;. In summary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LAG / LEAD&lt;/strong&gt; — access values from neighboring rows, ideal for time-series deltas and gap detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RANK&lt;/strong&gt; — competition-style ranking with gaps after ties, best for "top N" filtering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DENSE_RANK&lt;/strong&gt; — ranking without gaps after ties, best when missing rank numbers would be confusing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ROW_NUMBER&lt;/strong&gt; — always unique, best for pagination and deduplication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NTILE(n)&lt;/strong&gt; — bucket rows into n groups, best for percentiles and cohort segmentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real superpower here is combining them. Use &lt;code&gt;LAG&lt;/code&gt; to calculate MoM change, then &lt;code&gt;NTILE&lt;/code&gt; to segment products into growth quartiles, then &lt;code&gt;DENSE_RANK&lt;/code&gt; to rank within each quartile — all in a single query using CTEs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Try rewriting a query you currently use that relies on a self-join for row comparison. Chances are &lt;code&gt;LAG&lt;/code&gt; or &lt;code&gt;LEAD&lt;/code&gt; collapses it into something half the length. Share what you built in the comments — I'd love to see real-world use cases you've solved with these functions!&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Beyond Basic Indexes: Mastering Partial, Composite, and Covering Indexes in SQL</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Mon, 11 May 2026 04:21:41 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/beyond-basic-indexes-mastering-partial-composite-and-covering-indexes-in-sql-2led</link>
      <guid>https://dev.to/vivekdraxlr/beyond-basic-indexes-mastering-partial-composite-and-covering-indexes-in-sql-2led</guid>
      <description>&lt;p&gt;Most developers know that adding an index speeds up queries. But slapping a basic index on every column you filter by is a recipe for bloated storage, slower writes, and no real performance gain. The real power comes from &lt;em&gt;choosing the right type of index for the job&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we'll go beyond the basics and explore three advanced indexing strategies: &lt;strong&gt;partial indexes&lt;/strong&gt;, &lt;strong&gt;composite indexes&lt;/strong&gt;, and &lt;strong&gt;covering indexes&lt;/strong&gt;. Each solves a different class of performance problem, and knowing when to reach for each one separates good database work from great database work.&lt;/p&gt;

&lt;p&gt;We'll use PostgreSQL throughout, but these concepts apply (with some syntax variation) to MySQL, SQL Server, and other major databases.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup: An E-Commerce Orders Table
&lt;/h2&gt;

&lt;p&gt;Let's ground everything in a realistic scenario. Imagine an &lt;code&gt;orders&lt;/code&gt; table with tens of millions of rows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;          &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="nb"&gt;BIGINT&lt;/span&gt;        &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;      &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- 'pending', 'shipped', 'delivered', 'cancelled'&lt;/span&gt;
  &lt;span class="n"&gt;total&lt;/span&gt;       &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;   &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;region&lt;/span&gt;      &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over time, the table grows to 50 million rows. Most orders are &lt;code&gt;'delivered'&lt;/code&gt; or &lt;code&gt;'cancelled'&lt;/code&gt; — a tiny fraction (maybe 2%) are &lt;code&gt;'pending'&lt;/code&gt; or &lt;code&gt;'shipped'&lt;/code&gt;. This distribution is the key to understanding why generic indexes often fall short.&lt;/p&gt;




&lt;h2&gt;
  
  
  Partial Indexes: Index Only What You Query
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;partial index&lt;/strong&gt; is built over a &lt;em&gt;subset&lt;/em&gt; of rows, defined by a &lt;code&gt;WHERE&lt;/code&gt; condition. It's one of the most underused features in PostgreSQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Your support dashboard queries pending and shipped orders constantly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A plain index on &lt;code&gt;status&lt;/code&gt; would index all 50 million rows, even though 98% of them will never match this query. That's 50 million index entries serving 1 million actual lookups.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: A Partial Index
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_active&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now PostgreSQL only indexes the ~1 million "active" rows. The index is 98% smaller, fits more easily in memory, and stays faster even as the full table grows — because delivered/cancelled orders never bloat it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Plan Comparison
&lt;/h3&gt;

&lt;p&gt;Before (sequential scan on 50M rows):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Seq&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;1245000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1100000&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;Filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{pending,shipped}'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After (index scan on 1M-entry partial index):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;idx_orders_active&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;43&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;9834&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1100000&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use Partial Indexes
&lt;/h3&gt;

&lt;p&gt;Partial indexes shine when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A column has highly skewed distribution (most queries target a small minority of rows)&lt;/li&gt;
&lt;li&gt;You filter on "active" or "open" records and the closed ones pile up over time&lt;/li&gt;
&lt;li&gt;You want a unique constraint only for non-null values: &lt;code&gt;CREATE UNIQUE INDEX ON users (email) WHERE email IS NOT NULL;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Composite Indexes: Column Order Is Everything
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;composite index&lt;/strong&gt; (also called a multi-column index) covers more than one column. Done right, it eliminates multiple sequential scans. Done wrong, it wastes space and never gets used.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Left-to-Right Rule
&lt;/h3&gt;

&lt;p&gt;PostgreSQL can use a composite index only when the query filters on a &lt;em&gt;prefix&lt;/em&gt; of the indexed columns. Given:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_customer_status_date&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Query predicate&lt;/th&gt;
      &lt;th&gt;Uses index?&lt;/th&gt;
      &lt;th&gt;Why&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;WHERE customer_id = 42&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;✅ Yes&lt;/td&gt;
      &lt;td&gt;Matches leading column&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;WHERE customer_id = 42 AND status = 'pending'&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;✅ Yes&lt;/td&gt;
      &lt;td&gt;Matches first two columns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;WHERE customer_id = 42 AND status = 'pending' AND created_at &amp;gt; '2026-01-01'&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;✅ Yes (full use)&lt;/td&gt;
      &lt;td&gt;Matches all three columns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;WHERE status = 'pending'&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;❌ No&lt;/td&gt;
      &lt;td&gt;Skips leading column&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;WHERE created_at &amp;gt; '2026-01-01'&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;❌ No&lt;/td&gt;
      &lt;td&gt;Skips first two columns&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Column Ordering Strategy
&lt;/h3&gt;

&lt;p&gt;Follow this rule of thumb for column order in a composite index:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Equality columns first&lt;/strong&gt; — columns used with &lt;code&gt;=&lt;/code&gt; or &lt;code&gt;IN&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Range columns next&lt;/strong&gt; — columns used with &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;BETWEEN&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sort columns last&lt;/strong&gt; — columns used in &lt;code&gt;ORDER BY&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So for this query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ideal index is exactly what we created: &lt;code&gt;(customer_id, status, created_at DESC)&lt;/code&gt;. PostgreSQL can narrow by &lt;code&gt;customer_id&lt;/code&gt; (equality), then narrow further by &lt;code&gt;status&lt;/code&gt; (equality), then scan only the recent range of &lt;code&gt;created_at&lt;/code&gt; — all within the index structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Avoid Index Proliferation
&lt;/h3&gt;

&lt;p&gt;One well-designed composite index often replaces two or three single-column indexes. If you have &lt;code&gt;idx_customer_id&lt;/code&gt; and &lt;code&gt;idx_status&lt;/code&gt; separately, queries needing both columns force a "bitmap AND" merge — two index lookups joined in memory. A single composite index eliminates that overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Covering Indexes: Make the Index the Answer
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;covering index&lt;/strong&gt; is one that contains &lt;em&gt;every column a query needs&lt;/em&gt; — so PostgreSQL never has to touch the actual table rows. This is called an &lt;strong&gt;index-only scan&lt;/strong&gt;, and it's the fastest possible read path.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: Heap Fetches
&lt;/h3&gt;

&lt;p&gt;Normally, after an index lookup, PostgreSQL must fetch the actual row from the table heap to retrieve non-indexed columns. On large tables with many concurrent writes, those heap fetches cause random I/O that kills performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  The INCLUDE Clause
&lt;/h3&gt;

&lt;p&gt;PostgreSQL 11+ introduced the &lt;code&gt;INCLUDE&lt;/code&gt; clause to add "payload" columns to an index without making them part of the sort key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_customer_covering&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;INCLUDE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now this query runs entirely off the index — no heap access at all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  INCLUDE vs Adding to the Key
&lt;/h3&gt;

&lt;p&gt;Why not just add &lt;code&gt;total&lt;/code&gt; and &lt;code&gt;created_at&lt;/code&gt; to the index key itself? Because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Non-key columns aren't sorted&lt;/strong&gt;, so they can't help filter or sort — they just live as payload in the leaf pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller B-tree height&lt;/strong&gt; — fewer levels to traverse since non-key columns only appear in leaf nodes, not interior nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoids over-indexing&lt;/strong&gt; — the key remains selective; &lt;code&gt;INCLUDE&lt;/code&gt; columns just come along for the ride&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Seeing It in Action
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BUFFERS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;INCLUDE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;idx_orders_customer_status&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;892&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;031&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;821&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;Buffers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;shared&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;423&lt;/span&gt; &lt;span class="k"&gt;read&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;634&lt;/span&gt;   &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="mi"&gt;634&lt;/span&gt; &lt;span class="n"&gt;heap&lt;/span&gt; &lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="k"&gt;read&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;INCLUDE (total, created_at)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Index&lt;/span&gt; &lt;span class="k"&gt;Only&lt;/span&gt; &lt;span class="n"&gt;Scan&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;idx_orders_customer_covering&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;112&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;025&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;893&lt;/span&gt; &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;847&lt;/span&gt; &lt;span class="n"&gt;loops&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;Buffers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;shared&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;52&lt;/span&gt; &lt;span class="k"&gt;read&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;      &lt;span class="err"&gt;←&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="n"&gt;heap&lt;/span&gt; &lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="k"&gt;read&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a 5× reduction in execution time and zero heap I/O.&lt;/p&gt;




&lt;h2&gt;
  
  
  Combining All Three: The Power Move
&lt;/h2&gt;

&lt;p&gt;You can combine all three strategies. Here's a partial covering composite index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_active_by_region&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;INCLUDE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This index:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only indexes active orders (partial — keeps it small)&lt;/li&gt;
&lt;li&gt;Supports filtering by region and sorting by date (composite — ordered key)&lt;/li&gt;
&lt;li&gt;Returns &lt;code&gt;customer_id&lt;/code&gt; and &lt;code&gt;total&lt;/code&gt; without a heap fetch (covering — zero table I/O)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A query like this will hit this index perfectly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'EU-West'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Common Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Over-indexing writes more than you read.&lt;/strong&gt; Every index is updated on every &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt;. An extra 5 indexes on a high-write table can make inserts 2–3× slower. Audit with &lt;code&gt;pg_stat_user_indexes&lt;/code&gt; to find unused indexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;indexrelname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx_scan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx_tup_read&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_user_indexes&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;schemaname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'public'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;idx_scan&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Partial index predicates must match exactly.&lt;/strong&gt; PostgreSQL uses a partial index only when the query's &lt;code&gt;WHERE&lt;/code&gt; clause implies (or matches) the index predicate. &lt;code&gt;WHERE status = 'pending'&lt;/code&gt; won't use an index defined as &lt;code&gt;WHERE status IN ('pending', 'shipped')&lt;/code&gt; — you need &lt;code&gt;WHERE status IN ('pending', 'shipped')&lt;/code&gt; or the planner must be able to prove the subset relationship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;INCLUDE&lt;/code&gt; columns don't help WHERE or ORDER BY.&lt;/strong&gt; They are non-key payload only — PostgreSQL can't use them to narrow or sort. If you filter or sort by a column, it must be in the key part of the index, not in &lt;code&gt;INCLUDE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build indexes concurrently on live tables.&lt;/strong&gt; Creating an index on a large table locks out writes. Use &lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; to build the index without blocking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;idx_orders_active_by_region&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;INCLUDE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'shipped'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Partial indexes&lt;/strong&gt; target a narrow, frequently queried subset of rows. They stay small and fast even as the full table balloons in size.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composite indexes&lt;/strong&gt; cover multi-column queries efficiently — but column order matters. Equality predicates first, range predicates second, sort columns last.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Covering indexes&lt;/strong&gt; (with &lt;code&gt;INCLUDE&lt;/code&gt;) allow index-only scans, eliminating costly random I/O to the table heap.&lt;/li&gt;
&lt;li&gt;Combining all three gives you a precision instrument: small, ordered, and self-contained.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best index strategy always starts with understanding your query patterns. Profile with &lt;code&gt;EXPLAIN (ANALYZE, BUFFERS)&lt;/code&gt;, identify where heap reads or sequential scans dominate, then choose the index type that surgically removes that bottleneck.&lt;/p&gt;




&lt;p&gt;Have you tuned a slow query with one of these techniques? Drop a comment below — especially if you've hit a surprising edge case. And if you're new to indexing fundamentals, check out the earlier post in this series on &lt;a href="https://dev.to"&gt;basic SQL indexes&lt;/a&gt; before diving deeper.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>performance</category>
    </item>
    <item>
      <title>SQL Views Explained: Write Once, Query Forever</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Thu, 07 May 2026 03:01:49 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/sql-views-explained-write-once-query-forever-khc</link>
      <guid>https://dev.to/vivekdraxlr/sql-views-explained-write-once-query-forever-khc</guid>
      <description>&lt;p&gt;You've written the same 12-line JOIN query for the third time this week. It lives in your reporting script, your admin dashboard, and your data export job. Every time the business logic changes, you update it in three places — and inevitably miss one.&lt;/p&gt;

&lt;p&gt;SQL views were built to fix exactly this. A view is a named, stored query that you can treat like a table. Define your logic once, then reference it from anywhere. This article walks through everything you need to know to start using views effectively: what they are, how to create them, real-world scenarios where they shine, and the gotchas that trip people up.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a SQL View?
&lt;/h2&gt;

&lt;p&gt;A view is a virtual table based on the result of a &lt;code&gt;SELECT&lt;/code&gt; statement. It doesn't store data itself — it stores the &lt;em&gt;query&lt;/em&gt;. Every time you &lt;code&gt;SELECT&lt;/code&gt; from a view, the database runs that underlying query and returns fresh results.&lt;/p&gt;

&lt;p&gt;Here's the simplest possible view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;active_customers&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now instead of repeating that filter everywhere, you just do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;active_customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The database sees through the view and runs the full underlying query — you get current data, every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Creating, Replacing, and Dropping Views
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CREATE VIEW
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this exists, any analyst on your team can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No need to remember the aggregation logic — it's baked into the view.&lt;/p&gt;

&lt;h3&gt;
  
  
  CREATE OR REPLACE VIEW
&lt;/h3&gt;

&lt;p&gt;To update a view's definition without dropping and recreating it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_order_value&lt;/span&gt;  &lt;span class="c1"&gt;-- new column added&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DROP VIEW
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;monthly_revenue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;IF EXISTS&lt;/code&gt; to avoid errors if the view doesn't exist yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Simplifying Complex Reports
&lt;/h3&gt;

&lt;p&gt;Imagine you have an e-commerce schema: &lt;code&gt;orders&lt;/code&gt;, &lt;code&gt;order_items&lt;/code&gt;, &lt;code&gt;products&lt;/code&gt;, &lt;code&gt;customers&lt;/code&gt;. A typical report query might join all four tables and apply several filters. Put that in a view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;order_summary&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;item_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;  &lt;span class="k"&gt;c&lt;/span&gt;  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;order_items&lt;/span&gt; &lt;span class="n"&gt;oi&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;oi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your dashboard queries look clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Top 10 customers by spending this year&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lifetime_value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;order_summary&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;lifetime_value&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Hiding Sensitive Columns
&lt;/h3&gt;

&lt;p&gt;You want your analytics team to query the &lt;code&gt;employees&lt;/code&gt; table but not see salaries or social security numbers. Create a view that exposes only safe columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;employees_public&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;job_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;hire_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- salary, ssn, and bank_account columns are never exposed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Grant access to the view, not the base table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;employees_public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;analytics_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of the most practical uses of views in production systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Maintaining Backward Compatibility
&lt;/h3&gt;

&lt;p&gt;Your application queries a table called &lt;code&gt;user_profiles&lt;/code&gt;. You need to refactor the schema — split it into &lt;code&gt;users&lt;/code&gt; and &lt;code&gt;profile_details&lt;/code&gt;. Create a view with the old name that reconstructs the original shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;user_profiles&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avatar_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;location&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;profile_details&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Old queries keep working. You can migrate downstream consumers at your own pace.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Pre-Filtering for Multi-Tenant Apps
&lt;/h3&gt;

&lt;p&gt;In a SaaS app, data is often segmented by &lt;code&gt;tenant_id&lt;/code&gt;. Views can bake in row-level filtering:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;tenant_orders&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.current_tenant'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(PostgreSQL supports this pattern using session-level settings.)&lt;/p&gt;




&lt;h2&gt;
  
  
  Updatable Views
&lt;/h2&gt;

&lt;p&gt;In some situations you can run &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt; directly on a view. This works when the view is simple enough that the database can unambiguously map the change back to a single base table row.&lt;/p&gt;

&lt;p&gt;A view is generally updatable if it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selects from a &lt;strong&gt;single base table&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Does &lt;strong&gt;not&lt;/strong&gt; use &lt;code&gt;DISTINCT&lt;/code&gt;, &lt;code&gt;GROUP BY&lt;/code&gt;, &lt;code&gt;HAVING&lt;/code&gt;, &lt;code&gt;UNION&lt;/code&gt;, or aggregate functions&lt;/li&gt;
&lt;li&gt;Does &lt;strong&gt;not&lt;/strong&gt; include computed columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example of an updatable view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;active_products&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stock_quantity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;is_active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- This works because it maps cleanly to one row in `products`&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;active_products&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  WITH CHECK OPTION
&lt;/h3&gt;

&lt;p&gt;Add &lt;code&gt;WITH CHECK OPTION&lt;/code&gt; to prevent updates that would push rows &lt;em&gt;out of&lt;/em&gt; the view's filter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;active_products&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stock_quantity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;is_active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="k"&gt;OPTION&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- This would fail — it would make the row invisible to the view&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;active_products&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;is_active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;FALSE&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- ERROR: new row violates check option for view "active_products"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Views vs. CTEs: When to Use Which
&lt;/h2&gt;

&lt;p&gt;Both views and CTEs (covered in a &lt;a href="///dev.to%20articles/2026-04-10-sql-cte-mastering-common-table-expressions.md"&gt;previous article&lt;/a&gt;) let you name and reuse query logic. The key difference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Feature&lt;/th&gt;
      &lt;th&gt;View&lt;/th&gt;
      &lt;th&gt;CTE&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Persistence&lt;/td&gt;
      &lt;td&gt;Stored in database permanently&lt;/td&gt;
      &lt;td&gt;Exists only during one query&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Reusability&lt;/td&gt;
      &lt;td&gt;Available to all queries, all users&lt;/td&gt;
      &lt;td&gt;Local to the single SQL statement&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Access control&lt;/td&gt;
      &lt;td&gt;Can be granted/revoked independently&lt;/td&gt;
      &lt;td&gt;Not applicable&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Best for&lt;/td&gt;
      &lt;td&gt;Shared, frequently used logic&lt;/td&gt;
      &lt;td&gt;Ad-hoc, one-off query breakdowns&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Recursive queries&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; If you're writing the same subquery in multiple queries or multiple applications, make it a view. If you're just breaking up a complex single query for readability, use a CTE.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes and Gotchas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Assuming Views Cache Results
&lt;/h3&gt;

&lt;p&gt;Views do &lt;strong&gt;not&lt;/strong&gt; store data. Every query against a view re-executes the underlying SQL. If that underlying query is slow, your view will be slow too. Views give you reusability, not performance — for that, look at &lt;strong&gt;materialized views&lt;/strong&gt; (a topic for another day).&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Nesting Views Too Deeply
&lt;/h3&gt;

&lt;p&gt;It's tempting to build views on top of views on top of views. Resist this. Deep view nesting makes debugging painful and can confuse the query planner. If your view references another view which references another view, the optimizer has to unpack all of them before building an execution plan. Keep hierarchies shallow (two levels max).&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Putting Too Much Logic Into One View
&lt;/h3&gt;

&lt;p&gt;Views encourage DRY code, but that doesn't mean one monster view should serve every use case. A view that joins 8 tables for every query will be slow for callers who only need 2 of those tables. Create targeted views for specific, well-understood use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Forgetting That Schema Changes Can Break Views
&lt;/h3&gt;

&lt;p&gt;If you drop a column from a base table that a view depends on, the view breaks — often silently until someone queries it. Most databases won't warn you at &lt;code&gt;ALTER TABLE&lt;/code&gt; time. Build a habit of checking view dependencies before modifying base tables.&lt;/p&gt;

&lt;p&gt;In PostgreSQL, you can check what views depend on a table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;viewname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;definition&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_views&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;definition&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%your_table_name%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Trying to Pass Parameters Into Views
&lt;/h3&gt;

&lt;p&gt;Views are static — they can't accept parameters the way stored procedures can. If you find yourself creating dozens of near-identical views (&lt;code&gt;orders_2024&lt;/code&gt;, &lt;code&gt;orders_2025&lt;/code&gt;, &lt;code&gt;orders_jan&lt;/code&gt;, &lt;code&gt;orders_feb&lt;/code&gt;), that's a sign you want a parameterized query, a function, or a filtered query against a single general-purpose view instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;SQL views are a powerful tool for keeping your database logic clean, consistent, and secure. In summary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A view is a &lt;strong&gt;named, stored SELECT query&lt;/strong&gt; — not stored data&lt;/li&gt;
&lt;li&gt;Use them to &lt;strong&gt;centralize shared query logic&lt;/strong&gt; so changes propagate everywhere at once&lt;/li&gt;
&lt;li&gt;Use them to &lt;strong&gt;restrict access&lt;/strong&gt; by exposing only safe columns to certain roles&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;CREATE OR REPLACE&lt;/code&gt; to update view definitions without downtime&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;WITH CHECK OPTION&lt;/code&gt; to enforce constraints on updatable views&lt;/li&gt;
&lt;li&gt;Prefer &lt;strong&gt;CTEs for one-off breakdowns&lt;/strong&gt;, &lt;strong&gt;views for recurring shared logic&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Watch out for deep nesting, bloated view definitions, and silent schema-change failures&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Take one query you write repeatedly — a filter, a join, a report — and convert it into a view today. Then ask: who else on your team could benefit from having access to it?&lt;/p&gt;

&lt;p&gt;Have you run into situations where views saved you from repetition, or caused unexpected performance headaches? Drop a comment below — I'd love to hear how you're using them in production.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of an ongoing series on SQL databases. Previous articles covered &lt;a href="https://dev.to/"&gt;CTEs&lt;/a&gt;, &lt;a href="https://dev.to/"&gt;window functions&lt;/a&gt;, &lt;a href="https://dev.to/"&gt;indexes&lt;/a&gt;, and more. Follow along for new posts every few days.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>beginners</category>
    </item>
    <item>
      <title>SQL String Functions: Clean, Transform, and Tame Messy Data Like a Pro</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Wed, 06 May 2026 04:45:52 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/sql-string-functions-clean-transform-and-tame-messy-data-like-a-pro-52h6</link>
      <guid>https://dev.to/vivekdraxlr/sql-string-functions-clean-transform-and-tame-messy-data-like-a-pro-52h6</guid>
      <description>&lt;p&gt;Real-world data is messy. Users type their names in ALL CAPS, phone numbers come in five different formats, email addresses have trailing spaces, and city names are spelled three different ways. Before you can analyze or display that data meaningfully, you need to clean and reshape it — and SQL string functions are one of the most powerful tools in your toolkit for doing exactly that.&lt;/p&gt;

&lt;p&gt;In this guide, we'll go beyond the basics. You'll see how string functions work &lt;em&gt;together&lt;/em&gt; to solve real data problems — the kind that show up constantly in reporting pipelines, backend APIs, and data migrations.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Sample Database
&lt;/h2&gt;

&lt;p&gt;We'll use a realistic &lt;code&gt;customers&lt;/code&gt; table throughout this article:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;first_name&lt;/span&gt;    &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;last_name&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;city&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;country_code&lt;/span&gt;  &lt;span class="nb"&gt;CHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's the kind of data you might find in it after a messy data import:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;customer_id&lt;/th&gt;
      &lt;th&gt;first_name&lt;/th&gt;
      &lt;th&gt;last_name&lt;/th&gt;
      &lt;th&gt;email&lt;/th&gt;
      &lt;th&gt;phone&lt;/th&gt;
      &lt;th&gt;city&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;ALICE&lt;/td&gt;
&lt;td&gt;  Johnson  &lt;/td&gt;
&lt;td&gt;alice@example.com  &lt;/td&gt;
&lt;td&gt;(555) 123-4567&lt;/td&gt;
&lt;td&gt;new york&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;bob&lt;/td&gt;
&lt;td&gt;Smith&lt;/td&gt;
&lt;td&gt;BOB@EXAMPLE.COM&lt;/td&gt;
&lt;td&gt;555.987.6543&lt;/td&gt;
&lt;td&gt;Los Angeles &lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;María&lt;/td&gt;
&lt;td&gt;García&lt;/td&gt;
&lt;td&gt; maria@example.com&lt;/td&gt;
&lt;td&gt;5552345678&lt;/td&gt;
&lt;td&gt;CHICAGO&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;James&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;james@example.com&lt;/td&gt;
&lt;td&gt;+1-555-876-5432&lt;/td&gt;
&lt;td&gt;Houston&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Classic import chaos. Let's fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  TRIM: Killing the Invisible Enemies
&lt;/h2&gt;

&lt;p&gt;Leading and trailing whitespace is the cockroach of data problems — invisible, persistent, and it breaks things at the worst moments. &lt;code&gt;TRIM&lt;/code&gt; removes it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Remove spaces from both sides (default behavior)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;clean_last_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;clean_last_name
---------------
Johnson
Smith
García
(null)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also target only one side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;LTRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;no_leading_space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;RTRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;no_trailing_space&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; &lt;code&gt;TRIM&lt;/code&gt; only removes spaces by default. If your data has tabs (&lt;code&gt;\t&lt;/code&gt;) or newlines (&lt;code&gt;\n&lt;/code&gt;) — common in CSV imports — you need &lt;code&gt;REGEXP_REPLACE&lt;/code&gt; (covered below).&lt;/p&gt;




&lt;h2&gt;
  
  
  UPPER and LOWER: Standardizing Case
&lt;/h2&gt;

&lt;p&gt;Mixed-case data makes comparisons a nightmare. &lt;code&gt;alice@example.com&lt;/code&gt; and &lt;code&gt;ALICE@example.com&lt;/code&gt; will fail an equality check even though they're the same address.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Normalize all emails to lowercase&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;normalized_email&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;normalized_email&lt;/span&gt;
&lt;span class="c1"&gt;------------+-------------------&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;alice&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;bob&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;maria&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;james&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For display names, you often want title case. SQL doesn't have a built-in &lt;code&gt;INITCAP&lt;/code&gt; function in MySQL, but PostgreSQL does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL only&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;display_first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;display_last&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;display_first&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;display_last&lt;/span&gt;
&lt;span class="c1"&gt;--------------+-------------&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Johnson&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Smith&lt;/span&gt;
&lt;span class="n"&gt;Mar&lt;/span&gt;&lt;span class="err"&gt;í&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Garc&lt;/span&gt;&lt;span class="err"&gt;í&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;
&lt;span class="n"&gt;James&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MySQL, you'd build your own with &lt;code&gt;CONCAT(UPPER(LEFT(name,1)), LOWER(SUBSTRING(name,2)))&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  CONCAT: Building Strings From Parts
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CONCAT&lt;/code&gt; joins strings together. Use it for display names, slugs, full addresses — anywhere you need to assemble a value from multiple columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Build a full display name&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;))),&lt;/span&gt;
    &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;full_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;full_name&lt;/span&gt;
&lt;span class="c1"&gt;-----------&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt; &lt;span class="n"&gt;Johnson&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt; &lt;span class="n"&gt;Smith&lt;/span&gt;
&lt;span class="n"&gt;Mar&lt;/span&gt;&lt;span class="err"&gt;í&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;Garc&lt;/span&gt;&lt;span class="err"&gt;í&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;||&lt;/code&gt; operator is the ANSI-standard alternative to &lt;code&gt;CONCAT()&lt;/code&gt; and works in PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;full_name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; In MySQL, &lt;code&gt;CONCAT(value, NULL)&lt;/code&gt; returns &lt;code&gt;NULL&lt;/code&gt; — a single null column poisons the whole concatenation. Use &lt;code&gt;CONCAT_WS&lt;/code&gt; (concat with separator) or &lt;code&gt;COALESCE&lt;/code&gt; to guard against this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Safe version that handles NULLs&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;CONCAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;full_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  REPLACE: Fixing Formatted Junk
&lt;/h2&gt;

&lt;p&gt;Our phone numbers are a mess — parentheses, dots, dashes, country codes. &lt;code&gt;REPLACE&lt;/code&gt; lets you strip unwanted characters one at a time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Strip common phone number formatting characters&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="k"&gt;REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;')'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;digits_only&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;original&lt;/th&gt;
&lt;th&gt;digits_only&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;(555) 123-4567&lt;/td&gt;
&lt;td&gt;5551234567&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;555.987.6543&lt;/td&gt;
&lt;td&gt;5559876543&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;5552345678&lt;/td&gt;
&lt;td&gt;5552345678&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;+1-555-876-5432&lt;/td&gt;
&lt;td&gt;+15558765432&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For the last row, you'd still need to handle the &lt;code&gt;+1&lt;/code&gt; country prefix — which is where &lt;code&gt;REGEXP_REPLACE&lt;/code&gt; shines.&lt;/p&gt;




&lt;h2&gt;
  
  
  REGEXP_REPLACE: The Power Tool
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;REPLACE&lt;/code&gt; isn't enough, regular expressions let you target patterns instead of literal characters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: strip ALL non-digit characters from phone numbers&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;REGEXP_REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[^0-9]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'g'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;digits_only&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;phone&lt;/span&gt;            &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;digits_only&lt;/span&gt;
&lt;span class="c1"&gt;-----------------+------------&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;555&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4567&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5551234567&lt;/span&gt;
&lt;span class="mi"&gt;555&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;987&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6543&lt;/span&gt;     &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5559876543&lt;/span&gt;
&lt;span class="mi"&gt;5552345678&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5552345678&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;555&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;876&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;15558765432&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MySQL 8+, use &lt;code&gt;REGEXP_REPLACE(phone, '[^0-9]', '')&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt; Regex is powerful but slow on large tables. Don't run &lt;code&gt;REGEXP_REPLACE&lt;/code&gt; on a column inside a &lt;code&gt;WHERE&lt;/code&gt; clause if you can help it — you'll kill index usage and force a full scan. Clean your data in a migration, not on every query.&lt;/p&gt;




&lt;h2&gt;
  
  
  SUBSTRING: Extracting Exactly What You Need
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;SUBSTRING&lt;/code&gt; (or &lt;code&gt;SUBSTR&lt;/code&gt;) extracts part of a string by position.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Extract domain from email addresses&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;POSITION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'@'&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;email&lt;/span&gt;                   &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;
&lt;span class="c1"&gt;------------------------+------------&lt;/span&gt;
&lt;span class="n"&gt;alice&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;span class="n"&gt;bob&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;span class="n"&gt;maria&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;span class="n"&gt;james&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A more practical example — extracting year from a date stored as text (not ideal, but it happens):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Pull the year from 'YYYY-MM-DD' stored as text&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;order_ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_year&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_ref&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'____-__-__%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SUBSTRING(string, start, length)&lt;/code&gt; is the MySQL/SQL-standard syntax; PostgreSQL also supports &lt;code&gt;SUBSTRING(string FROM start FOR length)&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  LENGTH and POSITION: Measuring and Locating
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;LENGTH&lt;/code&gt; tells you how many characters a string has. &lt;code&gt;POSITION&lt;/code&gt; (or &lt;code&gt;CHARINDEX&lt;/code&gt; in SQL Server) tells you where a substring appears.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find customers with suspiciously short email addresses&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LENGTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;email_length&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;LENGTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Find where the '@' sits in each email&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;POSITION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'@'&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;at_position&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together, &lt;code&gt;POSITION&lt;/code&gt; and &lt;code&gt;SUBSTRING&lt;/code&gt; let you split strings on a delimiter without regex — useful for parsing values like &lt;code&gt;"Smith, John"&lt;/code&gt; or &lt;code&gt;"Houston, TX"&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Split "city, state" into two columns&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;POSITION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;location&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SUBSTRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;POSITION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;location&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;offices&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Putting It All Together: A Real Cleanup Query
&lt;/h2&gt;

&lt;p&gt;Here's a single query that normalizes the entire &lt;code&gt;customers&lt;/code&gt; table for a reporting export:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;                         &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt;            &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                                       &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;REGEXP_REPLACE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[^0-9]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'g'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;phone_digits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;                               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;country_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                                      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;country_code&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;customer_id&lt;/th&gt;
      &lt;th&gt;first_name&lt;/th&gt;
      &lt;th&gt;last_name&lt;/th&gt;
      &lt;th&gt;email&lt;/th&gt;
      &lt;th&gt;phone_digits&lt;/th&gt;
      &lt;th&gt;city&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Johnson&lt;/td&gt;
&lt;td&gt;alice@example.com&lt;/td&gt;
&lt;td&gt;5551234567&lt;/td&gt;
&lt;td&gt;New York&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;Smith&lt;/td&gt;
&lt;td&gt;bob@example.com&lt;/td&gt;
&lt;td&gt;5559876543&lt;/td&gt;
&lt;td&gt;Los Angeles&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;María&lt;/td&gt;
&lt;td&gt;García&lt;/td&gt;
&lt;td&gt;maria@example.com&lt;/td&gt;
&lt;td&gt;5552345678&lt;/td&gt;
&lt;td&gt;Chicago&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;James&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;james@example.com&lt;/td&gt;
&lt;td&gt;15558765432&lt;/td&gt;
&lt;td&gt;Houston&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Assuming NULLs are empty strings.&lt;/strong&gt; &lt;code&gt;LENGTH(NULL)&lt;/code&gt; returns &lt;code&gt;NULL&lt;/code&gt;, not 0. Always wrap nullable columns in &lt;code&gt;COALESCE(column, '')&lt;/code&gt; before passing them to string functions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chaining REPLACE instead of using REGEXP_REPLACE.&lt;/strong&gt; Five nested &lt;code&gt;REPLACE&lt;/code&gt; calls are harder to read and maintain than a single regex pattern. If you find yourself nesting more than two, switch to regex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Running string transformations in WHERE clauses on large tables.&lt;/strong&gt; &lt;code&gt;WHERE LOWER(email) = 'alice@example.com'&lt;/code&gt; prevents the database from using an index on &lt;code&gt;email&lt;/code&gt;. Either store emails already lowercased, or create a functional index: &lt;code&gt;CREATE INDEX idx_email_lower ON customers (LOWER(email))&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting character vs. byte length.&lt;/strong&gt; In MySQL, &lt;code&gt;LENGTH()&lt;/code&gt; returns byte count — a multi-byte UTF-8 character (like &lt;code&gt;é&lt;/code&gt; or &lt;code&gt;中&lt;/code&gt;) counts as 2–3 bytes. Use &lt;code&gt;CHAR_LENGTH()&lt;/code&gt; when you care about the number of visible characters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;SQL string functions aren't just cosmetic — they're essential data engineering tools. A few patterns worth remembering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clean at storage time, not query time&lt;/strong&gt; — normalize data when it enters your database, not on every &lt;code&gt;SELECT&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer functions&lt;/strong&gt; — &lt;code&gt;INITCAP(LOWER(TRIM(name)))&lt;/code&gt; is perfectly valid and readable SQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;COALESCE&lt;/code&gt; defensively&lt;/strong&gt; — any function receiving &lt;code&gt;NULL&lt;/code&gt; will return &lt;code&gt;NULL&lt;/code&gt;, so guard early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer &lt;code&gt;REGEXP_REPLACE&lt;/code&gt; for pattern-based cleanup&lt;/strong&gt; — it's more maintainable than chained &lt;code&gt;REPLACE&lt;/code&gt; calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch performance&lt;/strong&gt; — function calls in &lt;code&gt;WHERE&lt;/code&gt; clauses can kill index efficiency on large tables&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;String functions pair beautifully with SQL's &lt;code&gt;UPDATE&lt;/code&gt; statement — you can run a one-time data cleanup across your whole table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;))),&lt;/span&gt;
  &lt;span class="n"&gt;city&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;INITCAP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Have you run into a particularly gnarly data cleaning challenge in SQL? Drop it in the comments — I'd love to see how you solved it, or we can work through it together.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>SQL Date &amp; Time Functions: A Practical Guide for Real-World Queries</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Mon, 04 May 2026 05:53:12 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/sql-date-time-functions-a-practical-guide-for-real-world-queries-5hef</link>
      <guid>https://dev.to/vivekdraxlr/sql-date-time-functions-a-practical-guide-for-real-world-queries-5hef</guid>
      <description>&lt;p&gt;Working with dates and times is one of those things that sounds simple until you're staring at a timestamp column wondering why your "last 30 days" query returns unexpected results. Date and time functions are essential for analytics, reporting, and any application that tracks events over time — yet the subtle differences between database engines trip up even experienced developers.&lt;/p&gt;

&lt;p&gt;In this guide, we'll cover the most useful SQL date/time functions across &lt;strong&gt;PostgreSQL&lt;/strong&gt; and &lt;strong&gt;MySQL&lt;/strong&gt; — two of the most widely used databases — with practical, real-world examples. By the end, you'll know how to confidently filter by time ranges, calculate durations, group data by period, and avoid the most common pitfalls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Date/Time Functions Matter
&lt;/h2&gt;

&lt;p&gt;Think about how many queries involve time: "Show me orders placed in the last 7 days," "Calculate how long customers have been subscribed," "Group revenue by month." Nearly every meaningful business question has a temporal dimension.&lt;/p&gt;

&lt;p&gt;The challenge is that SQL databases don't all speak the same date dialect. PostgreSQL has &lt;code&gt;DATE_TRUNC&lt;/code&gt; and &lt;code&gt;AGE&lt;/code&gt;. MySQL has &lt;code&gt;DATE_FORMAT&lt;/code&gt; and &lt;code&gt;TIMESTAMPDIFF&lt;/code&gt;. Standard SQL gives you &lt;code&gt;EXTRACT&lt;/code&gt; and &lt;code&gt;CURRENT_TIMESTAMP&lt;/code&gt;. Knowing what's available — and what differs — saves you hours of debugging.&lt;/p&gt;

&lt;p&gt;We'll use a consistent example throughout: an e-commerce database with &lt;code&gt;orders&lt;/code&gt; and &lt;code&gt;customers&lt;/code&gt; tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Our example schema&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;joined_at&lt;/span&gt;     &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;      &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;placed_at&lt;/span&gt;     &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;shipped_at&lt;/span&gt;    &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;total_amount&lt;/span&gt;  &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  1. Getting the Current Date and Time
&lt;/h2&gt;

&lt;p&gt;The most basic need: "what time is it right now?" Every major database supports these, but syntax varies slightly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                  &lt;span class="c1"&gt;-- current timestamp with time zone&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;-- same as NOW(), SQL standard&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;-- date only (no time)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIME&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;-- time only&lt;/span&gt;

&lt;span class="c1"&gt;-- MySQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                  &lt;span class="c1"&gt;-- current datetime&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;CURDATE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;              &lt;span class="c1"&gt;-- date only&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;CURTIME&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;              &lt;span class="c1"&gt;-- time only&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;UTC_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;        &lt;span class="c1"&gt;-- current UTC datetime (very useful!)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world use:&lt;/strong&gt; Find all orders placed today.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- MySQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CURDATE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Casting a timestamp to &lt;code&gt;date&lt;/code&gt; (or using &lt;code&gt;DATE()&lt;/code&gt;) forces the database to discard the time component before comparing — otherwise &lt;code&gt;placed_at = CURRENT_DATE&lt;/code&gt; would fail because a timestamp like &lt;code&gt;2026-04-28 14:33:00&lt;/code&gt; is not equal to the date &lt;code&gt;2026-04-28&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Extracting Parts of a Date
&lt;/h2&gt;

&lt;p&gt;Sometimes you need just the year, month, or hour from a timestamp. &lt;code&gt;EXTRACT&lt;/code&gt; is the SQL-standard way to do this, supported by both PostgreSQL and MySQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Standard SQL — works in both PostgreSQL and MySQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt;  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MONTH&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DOW&lt;/span&gt;   &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;day_of_week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- 0=Sunday in PostgreSQL&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HOUR&lt;/span&gt;  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_hour&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world use:&lt;/strong&gt; Group orders by month to see monthly revenue trends.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt;  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;yr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MONTH&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;yr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mo&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;yr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;MySQL shorthand functions:&lt;/strong&gt; MySQL also offers dedicated extraction functions that are a bit more readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL-specific shortcuts&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="nb"&gt;YEAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;MONTH&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;HOUR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;DAYNAME&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;weekday_name&lt;/span&gt;   &lt;span class="c1"&gt;-- e.g. "Tuesday"&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Date Arithmetic with INTERVAL
&lt;/h2&gt;

&lt;p&gt;Adding or subtracting time from a date is a common need: "all orders from the last 30 days," "subscriptions expiring in the next 7 days," etc.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: use INTERVAL keyword&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'30 days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- MySQL: use DATE_SUB() or INTERVAL notation&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Also valid in MySQL:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;DATE_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;INTERVAL supports a wide range of units:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL examples&lt;/span&gt;
&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 year'&lt;/span&gt;
&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'3 months'&lt;/span&gt;
&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'2 hours 30 minutes'&lt;/span&gt;
&lt;span class="s1"&gt;'2026-01-01'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'90 days'&lt;/span&gt;

&lt;span class="c1"&gt;-- MySQL examples&lt;/span&gt;
&lt;span class="n"&gt;DATE_ADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="nb"&gt;YEAR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DATE_ADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;MONTH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DATE_ADD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shipped_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;-- estimated delivery date&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world use:&lt;/strong&gt; Flag orders that shipped more than 14 days ago but never had a follow-up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shipped_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;shipped_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;shipped_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'14 days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Calculating Differences Between Dates
&lt;/h2&gt;

&lt;p&gt;How many days between two dates? How long has a customer been with us? These questions require date difference functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  DATEDIFF (MySQL)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL: DATEDIFF returns difference in days&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;shipped_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;DATEDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shipped_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;days_to_ship&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;shipped_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  TIMESTAMPDIFF (MySQL)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;TIMESTAMPDIFF&lt;/code&gt; is more flexible — you choose the unit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL: specify the unit you want&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;TIMESTAMPDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;YEAR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;years_as_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;TIMESTAMPDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MONTH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;months_as_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;TIMESTAMPDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;days_as_customer&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AGE (PostgreSQL)
&lt;/h3&gt;

&lt;p&gt;PostgreSQL has the elegant &lt;code&gt;AGE()&lt;/code&gt; function that returns a human-readable interval:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;AGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;customer_age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- e.g. "2 years 3 mons 12 days"&lt;/span&gt;
  &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;DAY&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;AGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;days_as_customer&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world use:&lt;/strong&gt; Find customers who joined more than 1 year ago but have placed no order in the last 90 days (candidates for a win-back campaign).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_order_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 year'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;joined_at&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'90 days'&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Truncating Dates with DATE_TRUNC (PostgreSQL)
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;DATE_TRUNC&lt;/code&gt; is a PostgreSQL superpower for grouping data by time periods. It rounds a timestamp &lt;em&gt;down&lt;/em&gt; to the start of the given period:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="c1"&gt;-- Result: 2026-04-01 00:00:00+00&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="c1"&gt;-- Result: 2026-04-27 00:00:00+00 (start of current week, Monday)&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'year'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="c1"&gt;-- Result: 2026-01-01 00:00:00+00&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supported precision values include: &lt;code&gt;microseconds&lt;/code&gt;, &lt;code&gt;milliseconds&lt;/code&gt;, &lt;code&gt;second&lt;/code&gt;, &lt;code&gt;minute&lt;/code&gt;, &lt;code&gt;hour&lt;/code&gt;, &lt;code&gt;day&lt;/code&gt;, &lt;code&gt;week&lt;/code&gt;, &lt;code&gt;month&lt;/code&gt;, &lt;code&gt;quarter&lt;/code&gt;, &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;decade&lt;/code&gt;, &lt;code&gt;century&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world use:&lt;/strong&gt; Weekly revenue report with clean week boundaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;week_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_order_value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'12 weeks'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;week_start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MySQL, you can approximate this with &lt;code&gt;DATE_FORMAT&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL: group by week using DATE_FORMAT&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;DATE_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'%Y-%u'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;year_week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- e.g. "2026-17"&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                          &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;DATE_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="n"&gt;WEEK&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;DATE_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'%Y-%u'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;year_week&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  6. Formatting Dates for Display
&lt;/h2&gt;

&lt;p&gt;Raw timestamps aren't user-friendly. Format them for reports and APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: TO_CHAR&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TO_CHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Month DD, YYYY'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;display_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- "April 28, 2026"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TO_CHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'YYYY-MM-DD HH24:MI'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;display_dt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;-- "2026-04-28 14:33"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TO_CHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Day'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;weekday&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;-- "Tuesday"&lt;/span&gt;

&lt;span class="c1"&gt;-- MySQL: DATE_FORMAT&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;DATE_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'%M %d, %Y'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;display_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- "April 28, 2026"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;DATE_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'%Y-%m-%d %H:%i'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;display_dt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;-- "2026-04-28 14:33"&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;DATE_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'%W'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;weekday&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;-- "Tuesday"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Common Mistakes and Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Applying a function to an indexed column kills performance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you have an index on &lt;code&gt;placed_at&lt;/code&gt;, this query can't use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ Wrapping the column in a function prevents index usage&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CURDATE&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;          &lt;span class="c1"&gt;-- MySQL&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'day'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;   &lt;span class="c1"&gt;-- PostgreSQL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead, use a range comparison that leaves the column untouched:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ✅ Range filter — the index on placed_at can be used&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-04-28 00:00:00'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;  &lt;span class="s1"&gt;'2026-04-29 00:00:00'&lt;/span&gt;

&lt;span class="c1"&gt;-- Or dynamically:&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;  &lt;span class="k"&gt;CURRENT_DATE&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Ignoring time zones&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NOW()&lt;/code&gt; in PostgreSQL returns &lt;code&gt;timestamptz&lt;/code&gt; (with time zone). If your app stores timestamps in UTC but your &lt;code&gt;NOW()&lt;/code&gt; uses a local session timezone, you can get wildly wrong results. Always be explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- PostgreSQL: force UTC&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AT&lt;/span&gt; &lt;span class="nb"&gt;TIME&lt;/span&gt; &lt;span class="k"&gt;ZONE&lt;/span&gt; &lt;span class="s1"&gt;'UTC'&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'24 hours'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MySQL, use &lt;code&gt;UTC_TIMESTAMP()&lt;/code&gt; instead of &lt;code&gt;NOW()&lt;/code&gt; when you want UTC regardless of session settings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. NULL shipped_at breaks duration calculations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;shipped_at&lt;/code&gt; is NULL (order not yet shipped), a DATEDIFF or subtraction involving it returns NULL — not an error, just silent NULL propagation. Always handle this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- MySQL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATEDIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shipped_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;placed_at&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;'Not yet shipped'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;fulfillment_days&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. DATEDIFF argument order differs by database&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MySQL: &lt;code&gt;DATEDIFF(end_date, start_date)&lt;/code&gt; → positive if end is after start&lt;/li&gt;
&lt;li&gt;SQL Server: &lt;code&gt;DATEDIFF(unit, start_date, end_date)&lt;/code&gt; → note the unit comes first!&lt;/li&gt;
&lt;li&gt;PostgreSQL: use subtraction directly: &lt;code&gt;end_date - start_date&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This trips up developers switching between databases constantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary: Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Working with dates in SQL is a skill that pays dividends across virtually every query you'll write for analytics or reporting. The core ideas to remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;NOW()&lt;/code&gt; / &lt;code&gt;CURRENT_TIMESTAMP&lt;/code&gt; for the current moment; &lt;code&gt;CURRENT_DATE&lt;/code&gt; for today's date&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EXTRACT()&lt;/code&gt; pulls out specific date parts (year, month, day of week) — it's standard SQL&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;INTERVAL&lt;/code&gt; for date arithmetic — keep the indexed column bare on the left side of your &lt;code&gt;WHERE&lt;/code&gt; clause&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DATEDIFF&lt;/code&gt; / &lt;code&gt;TIMESTAMPDIFF&lt;/code&gt; (MySQL) and &lt;code&gt;AGE()&lt;/code&gt; / simple subtraction (PostgreSQL) handle duration calculations&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DATE_TRUNC&lt;/code&gt; (PostgreSQL) is your best friend for grouping by week/month/quarter&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DATE_FORMAT&lt;/code&gt; (MySQL) and &lt;code&gt;TO_CHAR&lt;/code&gt; (PostgreSQL) format dates for display&lt;/li&gt;
&lt;li&gt;Always consider NULLs and time zones — they're the source of most date-related bugs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Your Favorite Date Function?
&lt;/h2&gt;

&lt;p&gt;Date functions are deceptively powerful. Once you're comfortable with &lt;code&gt;DATE_TRUNC&lt;/code&gt; for time-series aggregation or &lt;code&gt;TIMESTAMPDIFF&lt;/code&gt; for cohort analysis, your SQL queries go from useful to genuinely insightful.&lt;/p&gt;

&lt;p&gt;Drop a comment below — what's the most useful date/time query pattern you use regularly? I'd love to see what real-world problems you're solving! And if you found this helpful, share it with a colleague who's wrestling with a "last 30 days" filter. 🗓️&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>mysql</category>
    </item>
    <item>
      <title>SQL GROUP BY &amp; HAVING: The Beginner's Guide to Summarizing Data Like a Pro</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Mon, 27 Apr 2026 05:23:51 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/sql-group-by-having-the-beginners-guide-to-summarizing-data-like-a-pro-3eb7</link>
      <guid>https://dev.to/vivekdraxlr/sql-group-by-having-the-beginners-guide-to-summarizing-data-like-a-pro-3eb7</guid>
      <description>&lt;p&gt;If you've ever needed to answer questions like &lt;em&gt;"How many orders did each customer place?"&lt;/em&gt; or &lt;em&gt;"Which product categories generated more than $10,000 in revenue?"&lt;/em&gt;, you've needed &lt;code&gt;GROUP BY&lt;/code&gt; and &lt;code&gt;HAVING&lt;/code&gt;. These two clauses are the backbone of summarizing and analyzing relational data in SQL.&lt;/p&gt;

&lt;p&gt;In this guide, we'll go from zero to confident with both clauses. You'll understand what they do, how they differ from each other (and from &lt;code&gt;WHERE&lt;/code&gt;), and walk away with real working examples you can adapt immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is GROUP BY?
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;GROUP BY&lt;/code&gt; clause collapses multiple rows that share the same value in one or more columns into a single summary row. You almost always pair it with an &lt;strong&gt;aggregate function&lt;/strong&gt; — &lt;code&gt;COUNT()&lt;/code&gt;, &lt;code&gt;SUM()&lt;/code&gt;, &lt;code&gt;AVG()&lt;/code&gt;, &lt;code&gt;MIN()&lt;/code&gt;, or &lt;code&gt;MAX()&lt;/code&gt; — to compute something meaningful about each group.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AGGREGATE_FUNCTION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other_column&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A Simple Example
&lt;/h3&gt;

&lt;p&gt;Imagine an &lt;code&gt;orders&lt;/code&gt; table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;amount&lt;/th&gt;
&lt;th&gt;status&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;50.00&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;120.00&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;30.00&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;200.00&lt;/td&gt;
&lt;td&gt;pending&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;75.00&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;How many orders has each customer placed?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;order_count&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SQL gathered all rows for customer 101 and counted them — giving you one result row per unique &lt;code&gt;customer_id&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  GROUP BY With Multiple Aggregate Functions
&lt;/h2&gt;

&lt;p&gt;You're not limited to one aggregate per query. Here's how to get the total and average order amount per customer in one shot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_order_value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;order_count&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;th&gt;avg_order_value&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;80.00&lt;/td&gt;
&lt;td&gt;40.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;195.00&lt;/td&gt;
&lt;td&gt;97.50&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;200.00&lt;/td&gt;
&lt;td&gt;200.00&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  GROUP BY Multiple Columns
&lt;/h2&gt;

&lt;p&gt;You can group by more than one column to create finer-grained buckets. Let's say you want to see order counts broken down by both &lt;code&gt;customer_id&lt;/code&gt; and &lt;code&gt;status&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;status&lt;/th&gt;
&lt;th&gt;order_count&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;completed&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;pending&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each unique combination of &lt;code&gt;customer_id&lt;/code&gt; + &lt;code&gt;status&lt;/code&gt; becomes its own group.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing HAVING: Filtering After Grouping
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;GROUP BY&lt;/code&gt; creates groups. &lt;code&gt;HAVING&lt;/code&gt; lets you &lt;strong&gt;filter those groups&lt;/strong&gt; based on the result of an aggregate function.&lt;/p&gt;

&lt;p&gt;Think of &lt;code&gt;HAVING&lt;/code&gt; as a &lt;code&gt;WHERE&lt;/code&gt; clause that runs &lt;em&gt;after&lt;/em&gt; grouping — it can reference aggregate results that don't exist until after grouping happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AGGREGATE_FUNCTION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other_column&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;column_name&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="n"&gt;AGGREGATE_FUNCTION&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;other_column&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example: Find High-Value Customers
&lt;/h3&gt;

&lt;p&gt;Which customers have spent more than $100 in total?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;195.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;200.00&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Customer 101 (who spent $80 total) is filtered out because their aggregate didn't meet the condition.&lt;/p&gt;




&lt;h2&gt;
  
  
  WHERE vs. HAVING: The Critical Difference
&lt;/h2&gt;

&lt;p&gt;This trips up many beginners. Here's a clear mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WHERE&lt;/strong&gt; filters &lt;strong&gt;individual rows&lt;/strong&gt; &lt;em&gt;before&lt;/em&gt; grouping&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HAVING&lt;/strong&gt; filters &lt;strong&gt;groups&lt;/strong&gt; &lt;em&gt;after&lt;/em&gt; grouping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They can work together in the same query. Here's an example using both:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find customers with more than one &lt;em&gt;completed&lt;/em&gt; order:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;completed_order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;        &lt;span class="c1"&gt;-- Filter rows BEFORE grouping&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;-- Filter groups AFTER grouping&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;completed_order_count&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;WHERE status = 'completed'&lt;/code&gt; removes the pending order for customer 103 before any grouping occurs. Then &lt;code&gt;HAVING COUNT(*) &amp;gt; 1&lt;/code&gt; removes customer groups with only one order.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Execution Order of a SQL Query
&lt;/h2&gt;

&lt;p&gt;Understanding why &lt;code&gt;WHERE&lt;/code&gt; and &lt;code&gt;HAVING&lt;/code&gt; behave differently becomes much clearer when you know the logical order SQL processes a query:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;FROM&lt;/strong&gt; — identify the table(s)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WHERE&lt;/strong&gt; — filter individual rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GROUP BY&lt;/strong&gt; — collapse rows into groups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HAVING&lt;/strong&gt; — filter groups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SELECT&lt;/strong&gt; — compute output columns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ORDER BY&lt;/strong&gt; — sort&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LIMIT/OFFSET&lt;/strong&gt; — paginate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;WHERE&lt;/code&gt; runs at step 2, before any grouping. &lt;code&gt;HAVING&lt;/code&gt; runs at step 4, after groups exist. This is why you &lt;em&gt;cannot&lt;/em&gt; use an aggregate function inside a &lt;code&gt;WHERE&lt;/code&gt; clause — the aggregates simply don't exist yet at that stage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Selecting a non-grouped, non-aggregated column
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ This will fail (or return unpredictable results in MySQL)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;amount&lt;/code&gt; is neither in &lt;code&gt;GROUP BY&lt;/code&gt; nor wrapped in an aggregate function. The database doesn't know &lt;em&gt;which&lt;/em&gt; amount to show for the group. Fix it: either add &lt;code&gt;amount&lt;/code&gt; to &lt;code&gt;GROUP BY&lt;/code&gt;, or aggregate it (e.g., &lt;code&gt;SUM(amount)&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Using WHERE to filter aggregates
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ This will throw an error&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SUM(amount)&lt;/code&gt; doesn't exist at the &lt;code&gt;WHERE&lt;/code&gt; stage. Use &lt;code&gt;HAVING&lt;/code&gt; instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ✅ Correct&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Forgetting that GROUP BY treats NULLs as a single group
&lt;/h3&gt;

&lt;p&gt;If your grouped column has &lt;code&gt;NULL&lt;/code&gt; values in multiple rows, all those &lt;code&gt;NULL&lt;/code&gt; rows will be collapsed into one group. This is usually fine — just be aware that a &lt;code&gt;NULL&lt;/code&gt; group will appear in your results unless you filter it out with &lt;code&gt;WHERE column IS NOT NULL&lt;/code&gt; before grouping.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Confusing COUNT(*) and COUNT(column)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;COUNT(*)&lt;/code&gt; counts all rows in the group, including those with &lt;code&gt;NULL&lt;/code&gt; values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;COUNT(column)&lt;/code&gt; counts only rows where that column is &lt;strong&gt;not&lt;/strong&gt; &lt;code&gt;NULL&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;non_null_amounts&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;amount&lt;/code&gt; is never &lt;code&gt;NULL&lt;/code&gt; in your table, these will be equal. But in tables with optional fields, the difference matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Real-World Example: Sales Report by Category
&lt;/h2&gt;

&lt;p&gt;Let's put it all together with a slightly richer scenario. Here's a &lt;code&gt;products&lt;/code&gt; table and a &lt;code&gt;sales&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find product categories with more than 5 sales,&lt;/span&gt;
&lt;span class="c1"&gt;-- showing total revenue and average sale price,&lt;/span&gt;
&lt;span class="c1"&gt;-- sorted by total revenue descending&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_sales&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;
&lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Joins &lt;code&gt;sales&lt;/code&gt; with &lt;code&gt;products&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Filters to sales from 2024 onward (&lt;code&gt;WHERE&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Groups by product category&lt;/li&gt;
&lt;li&gt;Keeps only categories with more than 5 sales (&lt;code&gt;HAVING&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Sorts by revenue, highest first&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GROUP BY&lt;/code&gt; collapses rows that share a value into summary groups; always pair it with an aggregate function&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;HAVING&lt;/code&gt; filters groups based on aggregate results — it's the post-grouping version of &lt;code&gt;WHERE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WHERE&lt;/code&gt; and &lt;code&gt;HAVING&lt;/code&gt; can coexist in the same query and serve complementary roles&lt;/li&gt;
&lt;li&gt;Every column in your &lt;code&gt;SELECT&lt;/code&gt; list must be either in &lt;code&gt;GROUP BY&lt;/code&gt; or inside an aggregate function&lt;/li&gt;
&lt;li&gt;SQL processes &lt;code&gt;WHERE&lt;/code&gt; before grouping and &lt;code&gt;HAVING&lt;/code&gt; after grouping — this is why you can't use aggregate functions in a &lt;code&gt;WHERE&lt;/code&gt; clause&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;GROUP BY&lt;/code&gt; and &lt;code&gt;HAVING&lt;/code&gt; open the door to powerful data analysis. Once you're comfortable here, great next steps include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Window Functions&lt;/strong&gt; — aggregate over groups without collapsing rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CTEs&lt;/strong&gt; — break complex GROUP BY queries into readable, step-by-step logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexes on grouped columns&lt;/strong&gt; — for performance when your tables get large&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Have you run into a tricky &lt;code&gt;GROUP BY&lt;/code&gt; or &lt;code&gt;HAVING&lt;/code&gt; situation at work? Drop it in the comments — I'd love to see real-world examples from the community!&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>SQL Window Functions Explained: Stop Collapsing Your Data with GROUP BY</title>
      <dc:creator>Vivek Kumar</dc:creator>
      <pubDate>Mon, 27 Apr 2026 05:23:42 +0000</pubDate>
      <link>https://dev.to/vivekdraxlr/sql-window-functions-explained-stop-collapsing-your-data-with-group-by-3ghe</link>
      <guid>https://dev.to/vivekdraxlr/sql-window-functions-explained-stop-collapsing-your-data-with-group-by-3ghe</guid>
      <description>&lt;p&gt;You're writing a query to find each employee's salary alongside the average salary for their department. Your first instinct might be to reach for &lt;code&gt;GROUP BY&lt;/code&gt; — but then you'd lose the individual employee rows. So you write a subquery, or a join against an aggregated CTE, and suddenly a simple question becomes a 20-line monster.&lt;/p&gt;

&lt;p&gt;Window functions exist precisely to solve this. They let you perform calculations &lt;em&gt;across a set of related rows&lt;/em&gt; without collapsing your result set. The row stays intact; the calculation rides alongside it. Once you understand them, you'll wonder how you ever lived without them.&lt;/p&gt;

&lt;p&gt;This guide covers the fundamentals: what a window is, how &lt;code&gt;OVER&lt;/code&gt;, &lt;code&gt;PARTITION BY&lt;/code&gt;, and &lt;code&gt;ORDER BY&lt;/code&gt; work inside it, and the most useful window functions you'll reach for every day.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is a Window Function?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;window function&lt;/strong&gt; performs a calculation over a "window" — a defined set of rows related to the current row. Unlike &lt;code&gt;GROUP BY&lt;/code&gt;, which aggregates many rows into one, a window function adds a new computed column to &lt;em&gt;each&lt;/em&gt; row.&lt;/p&gt;

&lt;p&gt;The syntax always follows this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;ROWS&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="n"&gt;frame_clause&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;OVER&lt;/code&gt; clause is what tells SQL "this is a window function." Without it, &lt;code&gt;SUM()&lt;/code&gt; is a regular aggregate; with it, &lt;code&gt;SUM() OVER (...)&lt;/code&gt; is a window function.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup: A Sample Schema
&lt;/h2&gt;

&lt;p&gt;We'll use two tables throughout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- employees table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;         &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;salary&lt;/span&gt;     &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;hire_date&lt;/span&gt;  &lt;span class="nb"&gt;DATE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Alice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="s1"&gt;'Engineering'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2019-03-15'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Bob'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="s1"&gt;'Engineering'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;88000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2020-07-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Carol'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="s1"&gt;'Engineering'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;102000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'2018-11-20'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Dave'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="s1"&gt;'Marketing'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="mi"&gt;72000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2021-01-10'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Eve'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="s1"&gt;'Marketing'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="mi"&gt;78000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2020-05-22'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Frank'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="s1"&gt;'Marketing'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="mi"&gt;68000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2022-03-30'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Grace'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="s1"&gt;'HR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="mi"&gt;61000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2021-09-14'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="s1"&gt;'Heidi'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="s1"&gt;'HR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="mi"&gt;65000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'2019-06-01'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  PARTITION BY: The Heart of Windowing
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;PARTITION BY&lt;/code&gt; divides your rows into groups — like &lt;code&gt;GROUP BY&lt;/code&gt;, but the groups are only used to &lt;em&gt;scope&lt;/em&gt; the calculation, not reduce the rows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 1: Department average salary alongside each employee&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dept_avg_salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;th&gt;salary&lt;/th&gt;
&lt;th&gt;dept_avg_salary&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Carol&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;102000&lt;/td&gt;
&lt;td&gt;95000.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;95000&lt;/td&gt;
&lt;td&gt;95000.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;88000&lt;/td&gt;
&lt;td&gt;95000.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Eve&lt;/td&gt;
&lt;td&gt;Marketing&lt;/td&gt;
&lt;td&gt;78000&lt;/td&gt;
&lt;td&gt;72666.67&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Dave&lt;/td&gt;
&lt;td&gt;Marketing&lt;/td&gt;
&lt;td&gt;72000&lt;/td&gt;
&lt;td&gt;72666.67&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Frank&lt;/td&gt;
&lt;td&gt;Marketing&lt;/td&gt;
&lt;td&gt;68000&lt;/td&gt;
&lt;td&gt;72666.67&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Heidi&lt;/td&gt;
&lt;td&gt;HR&lt;/td&gt;
&lt;td&gt;65000&lt;/td&gt;
&lt;td&gt;63000.00&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Grace&lt;/td&gt;
&lt;td&gt;HR&lt;/td&gt;
&lt;td&gt;61000&lt;/td&gt;
&lt;td&gt;63000.00&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every row is preserved. Each employee now has their department's average salary right next to their own. With a traditional &lt;code&gt;GROUP BY&lt;/code&gt;, you'd need a join or subquery to get this result.&lt;/p&gt;




&lt;h2&gt;
  
  
  ORDER BY Inside OVER: Running Calculations
&lt;/h2&gt;

&lt;p&gt;When you add &lt;code&gt;ORDER BY&lt;/code&gt; inside &lt;code&gt;OVER&lt;/code&gt;, SQL calculates the function &lt;em&gt;cumulatively&lt;/em&gt; as it processes rows in order. This is perfect for running totals or running averages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 2: Running total of salary by hire date within each department&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;hire_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hire_date&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;running_total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hire_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result for Engineering:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;th&gt;hire_date&lt;/th&gt;
&lt;th&gt;salary&lt;/th&gt;
&lt;th&gt;running_total&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Carol&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;2018-11-20&lt;/td&gt;
&lt;td&gt;102000&lt;/td&gt;
&lt;td&gt;102000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;2019-03-15&lt;/td&gt;
&lt;td&gt;95000&lt;/td&gt;
&lt;td&gt;197000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;2020-07-01&lt;/td&gt;
&lt;td&gt;88000&lt;/td&gt;
&lt;td&gt;285000&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;running_total&lt;/code&gt; grows as we move through hire dates — that's the &lt;code&gt;ORDER BY hire_date&lt;/code&gt; doing its work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ranking Functions: ROW_NUMBER, RANK, DENSE_RANK
&lt;/h2&gt;

&lt;p&gt;Three of the most useful window functions are ranking functions. They all assign a number to each row, but they handle ties differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 3: Ranking employees by salary within their department&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;row_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;       &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dense_rank&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see the difference between RANK and DENSE_RANK, let's imagine two Marketing employees with the same salary. Here's what happens:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;salary&lt;/th&gt;
&lt;th&gt;ROW_NUMBER&lt;/th&gt;
&lt;th&gt;RANK&lt;/th&gt;
&lt;th&gt;DENSE_RANK&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Eve&lt;/td&gt;
&lt;td&gt;78000&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Dave&lt;/td&gt;
&lt;td&gt;72000&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Frank&lt;/td&gt;
&lt;td&gt;68000&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The differences become clear with ties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ROW_NUMBER()&lt;/code&gt;&lt;/strong&gt; — Always assigns a unique number, even to ties. Ties get arbitrary ordering (non-deterministic).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;RANK()&lt;/code&gt;&lt;/strong&gt; — Ties share the same rank, and the next rank &lt;em&gt;skips&lt;/em&gt; numbers. Two employees tied at rank 2 means the next rank is 4.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;DENSE_RANK()&lt;/code&gt;&lt;/strong&gt; — Ties share the same rank, but the next rank is always consecutive. Two tied at rank 2 means the next is rank 3.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use &lt;code&gt;ROW_NUMBER()&lt;/code&gt; when you need exactly one row per partition (e.g., "get the single most recent order per customer"). Use &lt;code&gt;DENSE_RANK()&lt;/code&gt; when the gap in &lt;code&gt;RANK()&lt;/code&gt; would be confusing to end users.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Practical Use Case: Top-N Per Group
&lt;/h2&gt;

&lt;p&gt;One of the most common interview questions and real-world problems: &lt;em&gt;"Give me the top 2 earners in each department."&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked_employees&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;
      &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked_employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;department&lt;/th&gt;
&lt;th&gt;salary&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Carol&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;102000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;Engineering&lt;/td&gt;
&lt;td&gt;95000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Eve&lt;/td&gt;
&lt;td&gt;Marketing&lt;/td&gt;
&lt;td&gt;78000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Dave&lt;/td&gt;
&lt;td&gt;Marketing&lt;/td&gt;
&lt;td&gt;72000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Heidi&lt;/td&gt;
&lt;td&gt;HR&lt;/td&gt;
&lt;td&gt;65000&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Grace&lt;/td&gt;
&lt;td&gt;HR&lt;/td&gt;
&lt;td&gt;61000&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This pattern — window function in a CTE, filter in the outer query — is a workhorse of SQL analytics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes and Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Filtering on a window function directly in WHERE&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This doesn't work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ This will error&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rn&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;rn&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Window functions are evaluated &lt;em&gt;after&lt;/em&gt; &lt;code&gt;WHERE&lt;/code&gt;, so you can't filter on them in the same query level. Always wrap them in a CTE or subquery first, then filter — as shown in the top-N example above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Forgetting that ORDER BY inside OVER changes the frame&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adding &lt;code&gt;ORDER BY&lt;/code&gt; to your &lt;code&gt;OVER&lt;/code&gt; clause implicitly changes the window &lt;em&gt;frame&lt;/em&gt; from "all rows in the partition" to "all rows up to and including the current row." This is usually what you want for running totals, but it can surprise you when you use &lt;code&gt;SUM()&lt;/code&gt; or &lt;code&gt;AVG()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- These two behave differently!&lt;/span&gt;
&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                    &lt;span class="c1"&gt;-- whole partition&lt;/span&gt;
&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hire_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;-- cumulative avg&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want the whole-partition average but with a specific sort for row numbering, you may need to use separate window function expressions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Mixing window functions and GROUP BY&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Window functions and &lt;code&gt;GROUP BY&lt;/code&gt; can coexist in the same query, but the window function operates on the &lt;em&gt;already-grouped&lt;/em&gt; result set. This can lead to unexpected results if you're not careful about what rows exist after grouping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Performance on large tables without proper indexes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Window functions with &lt;code&gt;PARTITION BY&lt;/code&gt; and &lt;code&gt;ORDER BY&lt;/code&gt; can be expensive. Make sure the columns you're partitioning and ordering by are indexed, especially on large tables. Use &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; to check the query plan.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Window functions are one of the most powerful tools in SQL for analytics and reporting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;OVER()&lt;/code&gt; clause is what makes a function a window function — it defines the rows in scope for each calculation.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PARTITION BY&lt;/code&gt; divides rows into independent groups, like &lt;code&gt;GROUP BY&lt;/code&gt; but without collapsing rows.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ORDER BY&lt;/code&gt; inside &lt;code&gt;OVER&lt;/code&gt; enables running/cumulative calculations.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ROW_NUMBER()&lt;/code&gt;, &lt;code&gt;RANK()&lt;/code&gt;, and &lt;code&gt;DENSE_RANK()&lt;/code&gt; solve ranking and top-N problems elegantly.&lt;/li&gt;
&lt;li&gt;You cannot filter on window function results in &lt;code&gt;WHERE&lt;/code&gt; — use a CTE or subquery instead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next step after mastering these basics is exploring &lt;code&gt;LAG()&lt;/code&gt; and &lt;code&gt;LEAD()&lt;/code&gt; for comparing a row to previous/next rows, and frame clauses (&lt;code&gt;ROWS BETWEEN&lt;/code&gt;) for precise control over the window range. Those are the power tools on top of this foundation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Over to You
&lt;/h2&gt;

&lt;p&gt;Window functions clicked for me the moment I stopped thinking of them as "fancy aggregates" and started thinking of them as "a calculation that knows its neighbors." What was the moment it clicked for you?&lt;/p&gt;

&lt;p&gt;Drop your favorite window function use case in the comments — I'd love to see the creative ways people apply these in production. And if you're just getting started, try rewriting one of your correlated subqueries as a window function. You might never go back.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
