DEV Community

Slava Rozhnev
Slava Rozhnev

Posted on

2

Simple query for remove duplicates from table

In this small post I want to share the simple way to remove duplicates from table. The query works in MySQL, MariaDB and PostgreSQL databases. If you interested in such query for other RDBMS, please write me in comments.
Let's start. Assume we have simple table with two columns: id - is primary key and v simple integer value:

create table t (
    id int primary key,
    v int
);

insert into t(id, v) values 
(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 1),(7, 2),(8, 3),(9, 2),(10, 4),(11, 3);
Enter fullscreen mode Exit fullscreen mode

The code above create the table and insert couple of values. As you can see id have unique values, v have several duplicates:

+====+===+
| id | v |
+====+===+
| 1  | 1 |
| 2  | 1 |
| 3  | 2 |
| 4  | 2 |
| 5  | 1 |
| 6  | 1 |
| 7  | 2 |
| 8  | 3 |
| 9  | 2 |
| 10 | 4 |
| 11 | 3 |
+----+---+
Enter fullscreen mode Exit fullscreen mode

Our mission is remove rows this duplicates in column v and rest unique values with minimal id value.
How we can find the duplicates? We can use simple LEFT JOIN on field v with additional condition for prevent joins rows itself:

select * 
from t
left join t t1 on t.v = t1.v and t.id > t1.id;
Enter fullscreen mode Exit fullscreen mode

The query give us next result:

+====+===+========+========+
| id | v | id     | v      |
+====+===+========+========+
| 1  | 1 | (null) | (null) |
| 2  | 1 | 1      | 1      |
| 3  | 2 | (null) | (null) |
| 4  | 2 | 3      | 2      |
| 5  | 1 | 1      | 1      |
| 5  | 1 | 2      | 1      |
| 6  | 1 | 1      | 1      |
| 6  | 1 | 2      | 1      |
| 6  | 1 | 5      | 1      |
| 7  | 2 | 3      | 2      |
| 7  | 2 | 4      | 2      |
| 8  | 3 | (null) | (null) |
| 9  | 2 | 3      | 2      |
| 9  | 2 | 4      | 2      |
| 9  | 2 | 7      | 2      |
| 10 | 4 | (null) | (null) |
| 11 | 3 | 8      | 3      |
+----+---+--------+--------+
Enter fullscreen mode Exit fullscreen mode

We can see the unique rows with minimal id have (null) values in last columns. So we need to remove rest. We can done this in next simple query:

delete t.*
from t
left join t t1 on t.v = t1.v and t.id > t1.id
where t1.id is not null;
Enter fullscreen mode Exit fullscreen mode

Just run in on SQLize.online and see the result

P.S. After this publication my colleague @Akina suggested a shorter version:

delete t.*
from t
join t t1 on t.v = t1.v and t.id > t1.id;
Enter fullscreen mode Exit fullscreen mode

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay