DEV Community

Slava Rozhnev
Slava Rozhnev

Posted on

2

Simple query for remove duplicates from table

In this small post I want to share the simple way to remove duplicates from table. The query works in MySQL, MariaDB and PostgreSQL databases. If you interested in such query for other RDBMS, please write me in comments.
Let's start. Assume we have simple table with two columns: id - is primary key and v simple integer value:

create table t (
    id int primary key,
    v int
);

insert into t(id, v) values 
(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 1),(7, 2),(8, 3),(9, 2),(10, 4),(11, 3);
Enter fullscreen mode Exit fullscreen mode

The code above create the table and insert couple of values. As you can see id have unique values, v have several duplicates:

+====+===+
| id | v |
+====+===+
| 1  | 1 |
| 2  | 1 |
| 3  | 2 |
| 4  | 2 |
| 5  | 1 |
| 6  | 1 |
| 7  | 2 |
| 8  | 3 |
| 9  | 2 |
| 10 | 4 |
| 11 | 3 |
+----+---+
Enter fullscreen mode Exit fullscreen mode

Our mission is remove rows this duplicates in column v and rest unique values with minimal id value.
How we can find the duplicates? We can use simple LEFT JOIN on field v with additional condition for prevent joins rows itself:

select * 
from t
left join t t1 on t.v = t1.v and t.id > t1.id;
Enter fullscreen mode Exit fullscreen mode

The query give us next result:

+====+===+========+========+
| id | v | id     | v      |
+====+===+========+========+
| 1  | 1 | (null) | (null) |
| 2  | 1 | 1      | 1      |
| 3  | 2 | (null) | (null) |
| 4  | 2 | 3      | 2      |
| 5  | 1 | 1      | 1      |
| 5  | 1 | 2      | 1      |
| 6  | 1 | 1      | 1      |
| 6  | 1 | 2      | 1      |
| 6  | 1 | 5      | 1      |
| 7  | 2 | 3      | 2      |
| 7  | 2 | 4      | 2      |
| 8  | 3 | (null) | (null) |
| 9  | 2 | 3      | 2      |
| 9  | 2 | 4      | 2      |
| 9  | 2 | 7      | 2      |
| 10 | 4 | (null) | (null) |
| 11 | 3 | 8      | 3      |
+----+---+--------+--------+
Enter fullscreen mode Exit fullscreen mode

We can see the unique rows with minimal id have (null) values in last columns. So we need to remove rest. We can done this in next simple query:

delete t.*
from t
left join t t1 on t.v = t1.v and t.id > t1.id
where t1.id is not null;
Enter fullscreen mode Exit fullscreen mode

Just run in on SQLize.online and see the result

P.S. After this publication my colleague @Akina suggested a shorter version:

delete t.*
from t
join t t1 on t.v = t1.v and t.id > t1.id;
Enter fullscreen mode Exit fullscreen mode

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay