DEV Community

Snappy Tools
Snappy Tools

Posted on • Originally published at snappytools.app

How to Remove Duplicate Lines from Text: Python, JavaScript, Bash, and Online

Duplicate lines are everywhere: keyword exports from SEO tools, email lists from multiple sources, URL lists from site crawls, log files with repeated errors. Before you process any of these, you need them clean.

Here's every practical approach — command line, Python, JavaScript, and browser-based — depending on what you're working with.

The fastest way (for non-programmers)

If you just have a list and need it deduplicated now, paste it into the Remove Duplicate Lines tool at SnappyTools. It handles case-sensitive and case-insensitive matching, sorts alphabetically if needed, trims whitespace, and removes blank lines. Everything runs in the browser — nothing is uploaded.

For programmers, read on.


Bash / command line

Sort and deduplicate (changes order):

sort -u input.txt > output.txt
Enter fullscreen mode Exit fullscreen mode

sort -u (unique) sorts alphabetically and removes duplicate lines in one step. Fast, simple, but it changes the original order.

Preserve original order (awk approach):

awk '!seen[$0]++' input.txt > output.txt
Enter fullscreen mode Exit fullscreen mode

This is the idiomatic way to deduplicate while preserving order in bash. It uses an associative array called seen — for each line, if it hasn't been seen before, it increments the counter and prints the line. Seen lines increment to 2, 3, etc. but are not printed.

Case-insensitive deduplication:

awk '!seen[tolower($0)]++' input.txt > output.txt
Enter fullscreen mode Exit fullscreen mode

Windows PowerShell:

Get-Content input.txt | Sort-Object -Unique | Set-Content output.txt
Enter fullscreen mode Exit fullscreen mode

For order-preserving deduplication in PowerShell:

$seen = @{}
Get-Content input.txt | Where-Object { 
    $lower = $_.ToLower()
    !$seen[$lower] -and ($seen[$lower] = $true)
}
Enter fullscreen mode Exit fullscreen mode

Python

Preserve order (most common approach):

with open('input.txt', 'r') as f:
    lines = f.read().splitlines()

seen = set()
unique = []
for line in lines:
    if line not in seen:
        seen.add(line)
        unique.append(line)

with open('output.txt', 'w') as f:
    f.write('\n'.join(unique))
Enter fullscreen mode Exit fullscreen mode

One-liner using dict.fromkeys() (Python 3.7+, preserves order):

lines = open('input.txt').read().splitlines()
unique = list(dict.fromkeys(lines))
open('output.txt', 'w').write('\n'.join(unique))
Enter fullscreen mode Exit fullscreen mode

This works because dict preserves insertion order in Python 3.7+, and dict.fromkeys() ignores duplicate keys.

Case-insensitive deduplication (keep original casing):

seen_lower = set()
unique = []
for line in lines:
    if line.lower() not in seen_lower:
        seen_lower.add(line.lower())
        unique.append(line)
Enter fullscreen mode Exit fullscreen mode

Remove blank lines too:

unique = [line for line in dict.fromkeys(lines) if line.strip()]
Enter fullscreen mode Exit fullscreen mode

Sort output alphabetically:

unique = sorted(set(lines))  # case-sensitive
unique = sorted(set(lines), key=str.lower)  # case-insensitive sort
Enter fullscreen mode Exit fullscreen mode

JavaScript / Node.js

Browser (inline):

const text = document.getElementById('input').value;
const lines = text.split('\n');
const unique = [...new Set(lines)];
document.getElementById('output').value = unique.join('\n');
Enter fullscreen mode Exit fullscreen mode

Set removes duplicates while preserving insertion order in JavaScript (ES6+).

Case-insensitive, preserve original casing:

const seen = new Set();
const unique = lines.filter(line => {
  const lower = line.toLowerCase();
  if (seen.has(lower)) return false;
  seen.add(lower);
  return true;
});
Enter fullscreen mode Exit fullscreen mode

Node.js (reading from a file):

const fs = require('fs');
const lines = fs.readFileSync('input.txt', 'utf8').split('\n');
const unique = [...new Set(lines)].filter(Boolean); // filter(Boolean) removes empty strings
fs.writeFileSync('output.txt', unique.join('\n'));
Enter fullscreen mode Exit fullscreen mode

Remove blank lines and trim whitespace:

const unique = [...new Set(lines.map(l => l.trim()))].filter(Boolean);
Enter fullscreen mode Exit fullscreen mode

SQL

If your duplicates are in a database table:

-- View unique values
SELECT DISTINCT column_name FROM table_name;

-- Delete duplicates, keeping the row with the lowest id
DELETE FROM table_name
WHERE id NOT IN (
  SELECT MIN(id)
  FROM table_name
  GROUP BY column_name
);
Enter fullscreen mode Exit fullscreen mode

Python Pandas (for CSV files)

If you need to deduplicate rows in a CSV:

import pandas as pd

df = pd.read_csv('input.csv')

# Remove rows where all columns are duplicated
df_unique = df.drop_duplicates()

# Remove rows where a specific column is duplicated
df_unique = df.drop_duplicates(subset='email')

# Keep the last occurrence instead of the first
df_unique = df.drop_duplicates(subset='email', keep='last')

df_unique.to_csv('output.csv', index=False)
Enter fullscreen mode Exit fullscreen mode

Which approach to use

Situation Best tool
Quick paste — any format Browser tool
Shell script / automation awk '!seen[$0]++'
Need sorting sort -u
Python script dict.fromkeys() or set
JavaScript new Set(lines)
CSV with column-specific dedup Pandas drop_duplicates(subset=...)
Database table SQL DELETE WHERE id NOT IN (SELECT MIN(id)...)

For one-off tasks and most keyword/email/URL list cleaning, the browser tool is the fastest path. For anything in a script or pipeline, awk '!seen[$0]++' (bash) or list(dict.fromkeys(lines)) (Python) are the most readable and portable options.

Top comments (0)