How to Remove Duplicate Lines from Text: Python, JavaScript, Bash, and Online

#webdev #productivity #beginners #tools

Duplicate lines are everywhere: keyword exports from SEO tools, email lists from multiple sources, URL lists from site crawls, log files with repeated errors. Before you process any of these, you need them clean.

Here's every practical approach — command line, Python, JavaScript, and browser-based — depending on what you're working with.

The fastest way (for non-programmers)

If you just have a list and need it deduplicated now, paste it into the Remove Duplicate Lines tool at SnappyTools. It handles case-sensitive and case-insensitive matching, sorts alphabetically if needed, trims whitespace, and removes blank lines. Everything runs in the browser — nothing is uploaded.

For programmers, read on.

Bash / command line

Sort and deduplicate (changes order):

sort -u input.txt > output.txt

sort -u (unique) sorts alphabetically and removes duplicate lines in one step. Fast, simple, but it changes the original order.

Preserve original order (awk approach):

awk '!seen[$0]++' input.txt > output.txt

This is the idiomatic way to deduplicate while preserving order in bash. It uses an associative array called seen — for each line, if it hasn't been seen before, it increments the counter and prints the line. Seen lines increment to 2, 3, etc. but are not printed.

Case-insensitive deduplication:

awk '!seen[tolower($0)]++' input.txt > output.txt

Windows PowerShell:

Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt

For order-preserving deduplication in PowerShell:

$seen = @{}
Get-Content input.txt | Where-Object { 
    $lower = $_.ToLower()
    !$seen[$lower] -and ($seen[$lower] = $true)
}

Python

Preserve order (most common approach):

with open('input.txt', 'r') as f:
    lines = f.read().splitlines()

seen = set()
unique = []
for line in lines:
    if line not in seen:
        seen.add(line)
        unique.append(line)

with open('output.txt', 'w') as f:
    f.write('\n'.join(unique))

One-liner using dict.fromkeys() (Python 3.7+, preserves order):

lines = open('input.txt').read().splitlines()
unique = list(dict.fromkeys(lines))
open('output.txt', 'w').write('\n'.join(unique))

This works because dict preserves insertion order in Python 3.7+, and dict.fromkeys() ignores duplicate keys.

Case-insensitive deduplication (keep original casing):

seen_lower = set()
unique = []
for line in lines:
    if line.lower() not in seen_lower:
        seen_lower.add(line.lower())
        unique.append(line)

Remove blank lines too:

unique = [line for line in dict.fromkeys(lines) if line.strip()]

Sort output alphabetically:

unique = sorted(set(lines))  # case-sensitive
unique = sorted(set(lines), key=str.lower)  # case-insensitive sort

JavaScript / Node.js

Browser (inline):

const text = document.getElementById('input').value;
const lines = text.split('\n');
const unique = [...new Set(lines)];
document.getElementById('output').value = unique.join('\n');

Set removes duplicates while preserving insertion order in JavaScript (ES6+).

Case-insensitive, preserve original casing:

const seen = new Set();
const unique = lines.filter(line => {
  const lower = line.toLowerCase();
  if (seen.has(lower)) return false;
  seen.add(lower);
  return true;
});

Node.js (reading from a file):

const fs = require('fs');
const lines = fs.readFileSync('input.txt', 'utf8').split('\n');
const unique = [...new Set(lines)].filter(Boolean); // filter(Boolean) removes empty strings
fs.writeFileSync('output.txt', unique.join('\n'));

Remove blank lines and trim whitespace:

const unique = [...new Set(lines.map(l => l.trim()))].filter(Boolean);

SQL

If your duplicates are in a database table:

-- View unique values
SELECT DISTINCT column_name FROM table_name;

-- Delete duplicates, keeping the row with the lowest id
DELETE FROM table_name
WHERE id NOT IN (
  SELECT MIN(id)
  FROM table_name
  GROUP BY column_name
);

Python Pandas (for CSV files)

If you need to deduplicate rows in a CSV:

import pandas as pd

df = pd.read_csv('input.csv')

# Remove rows where all columns are duplicated
df_unique = df.drop_duplicates()

# Remove rows where a specific column is duplicated
df_unique = df.drop_duplicates(subset='email')

# Keep the last occurrence instead of the first
df_unique = df.drop_duplicates(subset='email', keep='last')

df_unique.to_csv('output.csv', index=False)

Which approach to use

Situation	Best tool
Quick paste — any format	Browser tool
Shell script / automation	`awk '!seen[$0]++'`
Need sorting	`sort -u`
Python script	`dict.fromkeys()` or `set`
JavaScript	`new Set(lines)`
CSV with column-specific dedup	Pandas `drop_duplicates(subset=...)`
Database table	SQL `DELETE WHERE id NOT IN (SELECT MIN(id)...)`

For one-off tasks and most keyword/email/URL list cleaning, the browser tool is the fastest path. For anything in a script or pipeline, awk '!seen[$0]++' (bash) or list(dict.fromkeys(lines)) (Python) are the most readable and portable options.