<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Luis Monasterios</title>
    <description>The latest articles on DEV Community by Luis Monasterios (@lulij).</description>
    <link>https://dev.to/lulij</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3991404%2F1fc94d05-c11f-4ef4-a7d6-a363e299e470.jpg</url>
      <title>DEV Community: Luis Monasterios</title>
      <link>https://dev.to/lulij</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lulij"/>
    <language>en</language>
    <item>
      <title>Data Analysis SQL: Asking the right questions and using the right tools</title>
      <dc:creator>Luis Monasterios</dc:creator>
      <pubDate>Sun, 28 Jun 2026 20:18:04 +0000</pubDate>
      <link>https://dev.to/lulij/data-analysis-sql-asking-the-right-questions-and-using-the-right-tools-2a5o</link>
      <guid>https://dev.to/lulij/data-analysis-sql-asking-the-right-questions-and-using-the-right-tools-2a5o</guid>
      <description>&lt;p&gt;If you've ever practiced SQL interview questions, you have almost certainly crossed paths with this classic prompt: &lt;em&gt;“Find the second highest salary in the company.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When the problem lands on the table, your first instinct is usually to smile, open your editor, and type this in four seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;OFFSET&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And just like that, &lt;strong&gt;you are probably out of the interview.&lt;/strong&gt; Not because the syntax is broken—that runs perfectly in PostgreSQL—but because you just committed the cardinal sin of data engineering: &lt;strong&gt;you dove right into the code before understanding the business.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Today, I want to break down this problem (inspired by the &lt;em&gt;Second Highest Salary&lt;/em&gt; challenge on DataLemur). The goal isn't to teach you basic SQL syntax, but to show you how an analyst actually thinks when handed an ambiguous requirement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 1: Ask the Awkward Questions
&lt;/h2&gt;

&lt;p&gt;Picture this: you aren't sitting on a coding platform; the CFO just walked into your office on a Tuesday at 9:00 AM asking for this number. If you run the &lt;code&gt;OFFSET 1&lt;/code&gt; query, you will immediately step on three landmines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Tie Trap:&lt;/strong&gt; Suppose the CEO makes &lt;strong&gt;$12,000&lt;/strong&gt;, and both the CTO and the Principal Architect make &lt;strong&gt;$10,000&lt;/strong&gt;. If you sort descending, Row 1 is $12k, and Row 2 is $10k. Perfect. But what if the CTO &lt;em&gt;also&lt;/em&gt; makes $12,000? Your query returns $12,000 as the "second salary." &lt;strong&gt;That is wrong.&lt;/strong&gt; That is the &lt;em&gt;first&lt;/em&gt; salary tier, repeated. The CFO wants the second monetary tier, not the second individual on the payroll.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Phantom Hierarchy:&lt;/strong&gt; The prompt says &lt;em&gt;"employees"&lt;/em&gt;. Do Managers count as employees in this database, or do they sit on a separate pay scale? Are Directors in this table? If you don’t ask, you risk handing the executive a Vice President's salary when they actually wanted data on the general staff. &lt;em&gt;(For our exercise, we will assume everyone coexists in one table).&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Invisible Edge Case:&lt;/strong&gt; What if the company is a newly founded startup with &lt;strong&gt;only one employee&lt;/strong&gt;? Your &lt;code&gt;OFFSET 1&lt;/code&gt; query will return an empty table (&lt;code&gt;0 rows&lt;/code&gt;). In software engineering, a database returning "nothing" causes the backend consuming your API to throw a lovely &lt;code&gt;NullPointerException&lt;/code&gt;. The SQL standard dictates that if a requested metric does not exist, the database should return &lt;code&gt;NULL&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once we reconcile the prompt with real-world logic, the requirement translates to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Give me the second-highest unique value from the salary column; if that tier does not exist, return &lt;code&gt;NULL&lt;/code&gt;.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Phase 2: Ditch the Subquery, Embrace Window Functions
&lt;/h2&gt;

&lt;p&gt;The classic textbook solution to fix the tie issue is a nested subquery:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;second_highest_salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works, but &lt;strong&gt;it isn't scalable engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your boss comes back tomorrow and says, &lt;em&gt;"Great job... now give me the 5th highest salary,"&lt;/em&gt; this approach devolves into an unreadable Russian nesting doll of subqueries.&lt;/p&gt;

&lt;p&gt;This is where we call upon &lt;strong&gt;Window Functions&lt;/strong&gt;. However, SQL gives us three different ways to rank data, and we have to pick the right tool for the job. Given the salary set &lt;code&gt;[12000, 12000, 9000]&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ROW_NUMBER()&lt;/code&gt; assigns: &lt;code&gt;1, 2, 3&lt;/code&gt; &lt;em&gt;(Discards the 9000 as the 3rd rank. Wrong).&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RANK()&lt;/code&gt; assigns: &lt;code&gt;1, 1, 3&lt;/code&gt; &lt;em&gt;(Skips rank 2 entirely because of the tie. Wrong).&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DENSE_RANK()&lt;/code&gt; assigns: &lt;code&gt;1, 1, 2&lt;/code&gt; &lt;em&gt;(Ties share rank 1, and the next unique value gets rank 2. Bingo).&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Phase 3: Build the Solution in Blocks (The Power of CTEs)
&lt;/h2&gt;

&lt;p&gt;Instead of writing a monolithic 10-line query where everything happens at once, write it using &lt;strong&gt;CTEs (&lt;em&gt;Common Table Expressions&lt;/em&gt;)&lt;/strong&gt;. CTEs don’t make your query run faster; they do something much more important: &lt;strong&gt;they make your logic auditable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s build our first logical block to map the entire company:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked_salaries&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we run just this CTE in our heads, we get a clean virtual table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;salary&lt;/th&gt;
&lt;th&gt;salary_rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;12000&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12000&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9500&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8000&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now, we write the main query to consume this block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked_salaries&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;salary&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;employees&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;second_highest_salary&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked_salaries&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;salary_rank&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Technical "Magic Trick"
&lt;/h3&gt;

&lt;p&gt;Look closely at that final &lt;code&gt;SELECT&lt;/code&gt;. I wrapped the column in &lt;code&gt;MAX(salary)&lt;/code&gt;, even though every row caught by &lt;code&gt;WHERE salary_rank = 2&lt;/code&gt; holds the exact same number. Why?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To defeat Landmine #3 (the single-employee company).&lt;/strong&gt; If the database only holds one person, &lt;code&gt;WHERE salary_rank = 2&lt;/code&gt; finds zero rows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If I had written &lt;code&gt;SELECT salary...&lt;/code&gt;, SQL outputs an empty table.&lt;/li&gt;
&lt;li&gt;By writing &lt;code&gt;SELECT MAX(salary)...&lt;/code&gt;, we force the engine to calculate the maximum of an empty set. By mathematical definition, &lt;strong&gt;the maximum of nothing in SQL is `NULL&lt;/strong&gt;`.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We satisfied all three business requirements with a single line of defensive code.&lt;/p&gt;




&lt;p&gt;When solving challenges on platforms like DataLemur, the compiler simply spits out a green &lt;em&gt;"Accepted"&lt;/em&gt;. But in the real world, the engine validating your code is an exhausted user trying to break your dashboard at 3:00 AM.&lt;/p&gt;

&lt;p&gt;Writing Senior-level SQL isn't about memorizing obscure syntax; it’s about operating under the assumption that &lt;strong&gt;your data will always be messier than the ticket promises.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>career</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
