DEV Community

Cover image for Deleting Duplicates in SQL
Danwycliff Ndwiga
Danwycliff Ndwiga

Posted on

Deleting Duplicates in SQL

In SQL handlingn duplicate records is essential for maintaining data accuracy, optimizing query performance, and ensuring consistent results.This article explores some practicle techniques to identify and delete duplicate rows using SQL queries.

Delete Duplicates Using a Unique Identifier

Consider the above code block

DELETE FROM cars
WHERE id IN (
    SELECT MAX(id)
    FROM cars
    GROUP BY model, brand
    HAVING COUNT(1) > 1
);
Enter fullscreen mode Exit fullscreen mode

The above code selects the records from cars and in the above code we want to delete the record that are duplicate in the model and brand we use the id to identify the MAX(we can only get one max) and we delete the only max record

Delete Duplicates Using a Self-Join

In this approach, we use a self-join to identify and delete duplicate rows in the cars table, keeping only the row with the smallest id for each unique combination of model and brand

DELETE FROM cars
WHERE id IN (
    SELECT c2.id
    FROM cars c1
    JOIN cars c2 ON c1.model = c2.model
                 AND c1.brand = c2.brand
    WHERE c1.id < c2.id
);
Enter fullscreen mode Exit fullscreen mode

Delete Duplicates Using a Window Function

DELETE FROM cars
WHERE id IN (
    SELECT id
    FROM (
        SELECT id,
               ROW_NUMBER() OVER (PARTITION BY model, brand ORDER BY id) AS rn
        FROM cars
    ) AS x
    WHERE x.rn > 1
);
Enter fullscreen mode Exit fullscreen mode

In the above case in the inner subquery, the ROW_NUMBER() function assigns a unique row number to each row within each group of duplicates (defined by model and brand)

Using MIN function

delete from cars
where id not in ( select min(id)
                  from cars
                  group by model, brand);
Enter fullscreen mode Exit fullscreen mode

The inner subquery SELECT MIN(id) FROM cars GROUP BY model, brand finds the lowest id for each unique combination of model and brand. This ensures that only one record for each car model and brand pair is retained
The DELETE FROM cars WHERE id NOT IN (...) statement removes records with IDs that aren't the minimum for their model and brand group. Essentially, this keeps only the oldest record and removes duplicates.

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay