<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Danwycliff Ndwiga</title>
    <description>The latest articles on DEV Community by Danwycliff Ndwiga (@danwycliff).</description>
    <link>https://dev.to/danwycliff</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2197070%2F13efa626-db1a-4333-8ca2-34b4060af0b5.jpg</url>
      <title>DEV Community: Danwycliff Ndwiga</title>
      <link>https://dev.to/danwycliff</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danwycliff"/>
    <language>en</language>
    <item>
      <title>Deleting Duplicates in SQL</title>
      <dc:creator>Danwycliff Ndwiga</dc:creator>
      <pubDate>Fri, 01 Nov 2024 12:53:17 +0000</pubDate>
      <link>https://dev.to/danwycliff/deleting-duplicates-in-sql-44a0</link>
      <guid>https://dev.to/danwycliff/deleting-duplicates-in-sql-44a0</guid>
      <description>&lt;p&gt;In SQL handlingn duplicate records is essential for maintaining data accuracy, optimizing query performance, and ensuring consistent results.This article explores some practicle techniques to identify and delete duplicate rows using SQL queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Delete Duplicates Using a Unique Identifier
&lt;/h2&gt;

&lt;p&gt;Consider the above code block&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM cars
WHERE id IN (
    SELECT MAX(id)
    FROM cars
    GROUP BY model, brand
    HAVING COUNT(1) &amp;gt; 1
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above code selects the records from cars and in the above code we want to delete the record that are duplicate in the model and brand we use the id to identify the MAX(we can only get one max) and we delete the only max record&lt;/p&gt;

&lt;h2&gt;
  
  
  Delete Duplicates Using a Self-Join
&lt;/h2&gt;

&lt;p&gt;In this approach, we use a self-join to identify and delete duplicate rows in the cars table, keeping only the row with the smallest id for each unique combination of model and brand&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM cars
WHERE id IN (
    SELECT c2.id
    FROM cars c1
    JOIN cars c2 ON c1.model = c2.model
                 AND c1.brand = c2.brand
    WHERE c1.id &amp;lt; c2.id
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Delete Duplicates Using a Window Function
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM cars
WHERE id IN (
    SELECT id
    FROM (
        SELECT id,
               ROW_NUMBER() OVER (PARTITION BY model, brand ORDER BY id) AS rn
        FROM cars
    ) AS x
    WHERE x.rn &amp;gt; 1
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above case in the inner subquery, the ROW_NUMBER() function assigns a unique row number to each row within each group of duplicates (defined by model and brand)&lt;/p&gt;

&lt;h2&gt;
  
  
  Using MIN function
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;delete from cars
where id not in ( select min(id)
                  from cars
                  group by model, brand);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The inner subquery SELECT MIN(id) FROM cars GROUP BY model, brand finds the lowest id for each unique combination of model and brand. This ensures that only one record for each car model and brand pair is retained&lt;br&gt;
The DELETE FROM cars WHERE id NOT IN (...) statement removes records with IDs that aren't the minimum for their model and brand group. Essentially, this keeps only the oldest record and removes duplicates.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Understanding and Using Window Functions in SQL</title>
      <dc:creator>Danwycliff Ndwiga</dc:creator>
      <pubDate>Thu, 31 Oct 2024 13:28:34 +0000</pubDate>
      <link>https://dev.to/danwycliff/understanding-and-using-window-functions-in-sql-3mop</link>
      <guid>https://dev.to/danwycliff/understanding-and-using-window-functions-in-sql-3mop</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Window functions are SQL functions that perform calculations across a set of rows related to the current row (window) or partition without grouping rows into a single output&lt;br&gt;
While aggregate functions like (SUM,AVG) group rows to produce a single reuslt window functions keep all rows in the result set and apply calcualtions over a specific range&lt;/p&gt;
&lt;h2&gt;
  
  
  Syntax
&lt;/h2&gt;

&lt;p&gt;Here is a syntax of the sql window function&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;window_function&amp;gt;() OVER (
    [PARTITION BY &amp;lt;partition_column&amp;gt;]
    [ORDER BY &amp;lt;order_column&amp;gt;]
    [&amp;lt;frame_clause&amp;gt;]
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Partition by clause divides the data into partitions or subsets based on one or more columns&lt;br&gt;
The order by orders the rows often based on the time column or the numeric value&lt;br&gt;
The frame clause defines the window frame specifying the exact range of rows to include in each calculation relative to the current row&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of window functions
&lt;/h2&gt;

&lt;p&gt;There are four main type of window functions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Aggregate Window Functions: These include functions like SUM(),AVG(),COUNT(),MIN(), and MAX() used with the OVER clause&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ranking Window Functions: These are functions like ROW_NUMBER(), RANK(),DENSE_RANK() and NTILE() they are used in order ranking&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Value Window Functions: Functions like LAG(),LEAD(),FIRST_VALUE() and LAST_VALUE() which access value from other rows within the window funtions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analytic Window Funtions: Funtions like CUME_DIST() and PERCENT_RANK() they are used to provide statistical insights&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>sql</category>
      <category>windowfunction</category>
      <category>database</category>
      <category>beginners</category>
    </item>
    <item>
      <title>SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.</title>
      <dc:creator>Danwycliff Ndwiga</dc:creator>
      <pubDate>Wed, 30 Oct 2024 18:32:24 +0000</pubDate>
      <link>https://dev.to/danwycliff/sql-select-into-vs-insert-into-select-statements-345a</link>
      <guid>https://dev.to/danwycliff/sql-select-into-vs-insert-into-select-statements-345a</guid>
      <description>&lt;p&gt;The "SELECT INTO" statement copies data from one table into a new table.&lt;br&gt;
the syntax of the statement is as follows below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT *
INTO newtable [IN externaldb]
FROM oldtable
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can also only copy some columns into the new table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column1, column2, column3, ...
INTO newtable [IN externaldb]
FROM oldtable
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;INSERT INTO SELECT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The INSERT INTO SELECT statement copies data from one table and inserts it into another table.&lt;/p&gt;

&lt;p&gt;The INSERT INTO SELECT statement requires that the data types in source and target tables match.&lt;/p&gt;

&lt;p&gt;Note: The existing records in the target table are unaffected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO table2
SELECT * FROM table1
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can also Copy only some columns from one table into another table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO table2 (column1, column2, column3, ...)
SELECT column1, column2, column3, ...
FROM table1
WHERE condition;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SELECT INTO creates a new table while INSERT INTO SELECT requires an existing table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SELECT INTO is for creating backup or temporary tables while INSERT INTO SELECT is used to transfer data between existing tables.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>data</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
