DEV Community

Slava Rozhnev
Slava Rozhnev

Posted on

Simple query for remove duplicates from table

In this small post I want to share the simple way to remove duplicates from table. The query works in MySQL, MariaDB and PostgreSQL databases. If you interested in such query for other RDBMS, please write me in comments.
Let's start. Assume we have simple table with two columns: id - is primary key and v simple integer value:

create table t (
    id int primary key,
    v int
);

insert into t(id, v) values 
(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 1),(7, 2),(8, 3),(9, 2),(10, 4),(11, 3);
Enter fullscreen mode Exit fullscreen mode

The code above create the table and insert couple of values. As you can see id have unique values, v have several duplicates:

+====+===+
| id | v |
+====+===+
| 1  | 1 |
| 2  | 1 |
| 3  | 2 |
| 4  | 2 |
| 5  | 1 |
| 6  | 1 |
| 7  | 2 |
| 8  | 3 |
| 9  | 2 |
| 10 | 4 |
| 11 | 3 |
+----+---+
Enter fullscreen mode Exit fullscreen mode

Our mission is remove rows this duplicates in column v and rest unique values with minimal id value.
How we can find the duplicates? We can use simple LEFT JOIN on field v with additional condition for prevent joins rows itself:

select * 
from t
left join t t1 on t.v = t1.v and t.id > t1.id;
Enter fullscreen mode Exit fullscreen mode

The query give us next result:

+====+===+========+========+
| id | v | id     | v      |
+====+===+========+========+
| 1  | 1 | (null) | (null) |
| 2  | 1 | 1      | 1      |
| 3  | 2 | (null) | (null) |
| 4  | 2 | 3      | 2      |
| 5  | 1 | 1      | 1      |
| 5  | 1 | 2      | 1      |
| 6  | 1 | 1      | 1      |
| 6  | 1 | 2      | 1      |
| 6  | 1 | 5      | 1      |
| 7  | 2 | 3      | 2      |
| 7  | 2 | 4      | 2      |
| 8  | 3 | (null) | (null) |
| 9  | 2 | 3      | 2      |
| 9  | 2 | 4      | 2      |
| 9  | 2 | 7      | 2      |
| 10 | 4 | (null) | (null) |
| 11 | 3 | 8      | 3      |
+----+---+--------+--------+
Enter fullscreen mode Exit fullscreen mode

We can see the unique rows with minimal id have (null) values in last columns. So we need to remove rest. We can done this in next simple query:

delete t.*
from t
left join t t1 on t.v = t1.v and t.id > t1.id
where t1.id is not null;
Enter fullscreen mode Exit fullscreen mode

Just run in on SQLize.online and see the result

P.S. After this publication my colleague @Akina suggested a shorter version:

delete t.*
from t
join t t1 on t.v = t1.v and t.id > t1.id;
Enter fullscreen mode Exit fullscreen mode

Top comments (0)