DEV Community

Cover image for SQL Duplicates
Andy Allison
Andy Allison

Posted on

SQL Duplicates

Disclaimer

This is a purely self serving article. I've had to do this a number of times and end up having to re-learn it all every time I've done it. SOOOOO this time I will drop some basics here as a reference for my failing memory but if it helps anybody else even better. It's not indented to be instructional or in anyway best practice so if you have issues or know of more appropriate ways please feel fee to drop things in the comments etc.

The Problem

The key problem being addressed is a number of duplicate records that have found their way into our datasets. These duplicates are matched on the master record id and the description used in the child example

id parentId description
1 P1 this is ok
2 P1 I'm here twice
3 P1 I'm here twice

Breaking it down

So the first thing I did was a query that will actualy help me find the rows I was looking for and the proof there are actual duplicates.

select "parentId",
       id,
       description,
       row_number()
       over (partition by "parentId", description order by "parentId", overview) as RN
from "Detail"
Enter fullscreen mode Exit fullscreen mode

This produces results like these

id parentId description RN
1 P1 this is ok 1
2 P1 I'm here twice 1
3 P1 I'm here twice 2

So we now have a row id/number based on the parentId and the description matching.

We then use this in the delete query to kill al the row numbers over 1

delete
from "Detail"
where id in (
    select id
    from (
             select "parentId",
                    id,
                    description,
                    row_number()
                    over (partition by "parentId", description order by "parentId", overview) as RN
             from "Detail"
         ) as F
    where RN > 1
);
Enter fullscreen mode Exit fullscreen mode

This is probably a convoluted way to achieve this and I'm sure there will be more efficient way but it has worked for me in this occasion so I hope it helps others.

Running the previous query then results in

id parentId description RN
1 P1 this is ok 1
2 P1 I'm here twice 1

Top comments (0)