Talemul Islam

Posted on Oct 1

Fuzzy String Matching in PostgreSQL with pg_trgm (Trigram Search Tutorial)

#postgres #trigram #database #sql

🔍 Fuzzy String Matching in PostgreSQL with `pg_trgm`

When working with user data — names, addresses, search queries — exact string matching isn’t always enough. People make typos, use alternate spellings, or abbreviations.

PostgreSQL provides a powerful extension called pg_trgm (trigram search) that allows you to perform fuzzy text matching and find “close enough” results.

🧩 What is a Trigram?

A trigram is simply a group of three consecutive characters from a string.

For example, the word:

Talemul

Would be split into trigrams (with start/end markers):

"  T", "Tal", "ale", "lem", "emu", "mul", "ul "

👉 Why the spaces?

PostgreSQL pads the string at the beginning and end with spaces to catch edge cases.

Now, when comparing two strings (Talemul vs Talimul), PostgreSQL compares how many trigrams they share:

Talemul: [" T","Tal","ale","lem","emu","mul","ul "]
Talimul: [" T","Tal","ali","lim","imu","mul","ul "]

They share trigrams like [" T","Tal","mul","ul "].

More shared trigrams = higher similarity score.

This makes trigram search excellent for catching typos or small spelling differences.

✅ Setup

Enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Check installed extensions:

\dx

📊 Example: Finding Similar Names

Let’s say we have a table of names:

CREATE TABLE people (
    id SERIAL PRIMARY KEY,
    name TEXT
);

INSERT INTO people (name) VALUES
('Talemul'),
('Talimul'),
('Talimul Islam'),
('Tamim'),
('Talim');

Now, search for names similar to "Talemul":

SELECT name, similarity(name, 'Talemul') AS score
FROM people
WHERE name % 'Talemul'   -- % means "is similar to"
ORDER BY score DESC;

🖥 Actual Output:

     name       |    score
----------------+-------------
 Talemul        | 1.00000000
 Talimul        | 0.45454547
 Talimul Islam  | 0.29411766
 Talim          | 0.27272728

👉 Notice how "Talimul Islam" and "Talim" get lower scores, and may even be excluded depending on the similarity threshold.

⚡ Boost Performance with Indexes

For large datasets, use a GIN index to speed up similarity searches:

CREATE INDEX idx_people_name_trgm ON people USING gin (name gin_trgm_ops);

Now your searches with % and similarity() will scale much better.

🎯 Real-World Use Cases

User search: Find users by name despite typos (Jon vs John).
Autocomplete: Suggest terms even when slightly misspelled.
Deduplication: Detect near-duplicate entries.
Search engines: Power “Did you mean?” features.

📝 Wrap Up

Trigrams are a simple but powerful idea: break words into three-character chunks, compare overlaps, and compute similarity.

PostgreSQL’s pg_trgm extension brings this technique directly into SQL, letting you build typo-tolerant search features without needing an external search engine.

🔗 References:

PostgreSQL Docs – pg_trgm

DEV Community

Fuzzy String Matching in PostgreSQL with pg_trgm (Trigram Search Tutorial)

🔍 Fuzzy String Matching in PostgreSQL with `pg_trgm`

🧩 What is a Trigram?

✅ Setup

📊 Example: Finding Similar Names

🖥 Actual Output:

⚡ Boost Performance with Indexes

🎯 Real-World Use Cases

📝 Wrap Up

Top comments (0)

🔍 Fuzzy String Matching in PostgreSQL with pg_trgm

🧩 What is a Trigram?

✅ Setup

📊 Example: Finding Similar Names

🖥 Actual Output:

⚡ Boost Performance with Indexes

🎯 Real-World Use Cases

📝 Wrap Up

🔍 Fuzzy String Matching in PostgreSQL with `pg_trgm`