DEV Community

Pranav Bakare
Pranav Bakare

Posted on

Removing duplicate records in ORACLE SQL

Removing duplicate records from the emp table using various methods, including creating the table, inserting sample data, backing up the table, and employing different SQL techniques. This will cover all solutions discussed previously.

Step 1: Create the Employee Table

First, create the emp table to store employee information.

CREATE TABLE emp (
emp_id NUMBER PRIMARY KEY,
first_name VARCHAR2(50),
last_name VARCHAR2(50),
email VARCHAR2(100),
department VARCHAR2(50),
salary NUMBER
);

Step 2: Insert Sample Data

Insert sample data into the emp table, which contains some duplicate records for demonstration purposes.

INSERT INTO emp (emp_id, first_name, last_name, email, department, salary) VALUES (1, 'John', 'Doe', 'john.doe@example.com', 'IT', 60000);
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary) VALUES (2, 'Jane', 'Smith', 'jane.smith@example.com', 'HR', 65000);
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary) VALUES (3, 'John', 'Doe', 'john.doe@example.com', 'IT', 60000); -- Duplicate
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary) VALUES (4, 'Mark', 'Johnson', 'mark.j@example.com', 'IT', 70000);
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary) VALUES (5, 'Jane', 'Smith', 'jane.smith@example.com', 'HR', 65000); -- Duplicate
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary) VALUES (6, 'Emily', 'Davis', 'emily.d@example.com', 'Finance', 55000);

Current State of the Table

After inserting the records, the emp table will look like this:

Step 3: Back Up the Table

Before removing duplicates, create a backup of the emp table to avoid any data loss.

CREATE TABLE emp_backup AS SELECT * FROM emp;

Step 4: Remove Duplicate Records Using Different Methods

Method 1: Using Unique Identifier (Email and MAX Function)

Identify and select duplicates based on the email and keep the maximum emp_id.

-- Identify duplicates
SELECT email, MAX(emp_id) AS max_emp_id, COUNT()
FROM emp
GROUP BY email
HAVING COUNT(
) > 1;

-- Delete duplicates
DELETE FROM emp
WHERE emp_id NOT IN (
SELECT MAX(emp_id)
FROM emp
GROUP BY email
);

Method 2: Using Self Join

Remove duplicates using a self-join on the email column.

DELETE FROM emp e1
WHERE EXISTS (
SELECT 1
FROM emp e2
WHERE e1.email = e2.email
AND e1.emp_id < e2.emp_id
);

Method 3: Using Window Functions

Utilize window functions and an inline view to identify duplicates and delete them.

DELETE FROM emp
WHERE emp_id IN (
SELECT emp_id
FROM (
SELECT emp_id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY emp_id) AS row_num
FROM emp
)
WHERE row_num > 1
);

Method 4: Using MIN Function

Keep the minimum emp_id for each duplicate email and delete others.

DELETE FROM emp
WHERE emp_id NOT IN (
SELECT MIN(emp_id)
FROM emp
GROUP BY email
);

Step 5: Verify the Table After Deletion

After executing the delete operations, you can verify the remaining records in the emp table to ensure that duplicates have been successfully removed.

SELECT * FROM emp;

Step 6: Confirm Backup Data

You can also check the backup table to ensure that the original data is preserved.

SELECT * FROM emp_backup;

Final State of the emp Table

After removing duplicates, the emp table should look like this:

Conclusion

This comprehensive guide demonstrates various methods for removing duplicate records from the emp table, including the creation of the table, insertion of sample data, backing up the table, and utilizing different SQL techniques. Always ensure to back up your data before performing deletion operations to avoid accidental data loss.

Top comments (0)