๐ Fuzzy String Matching in PostgreSQL with pg_trgm
When working with user data โ names, addresses, search queries โ exact string matching isnโt always enough. People make typos, use alternate spellings, or abbreviations.
PostgreSQL provides a powerful extension called pg_trgm (trigram search) that allows you to perform fuzzy text matching and find โclose enoughโ results.
๐งฉ What is a Trigram?
A trigram is simply a group of three consecutive characters from a string.
For example, the word:
Talemul
Would be split into trigrams (with start/end markers):
" T", "Tal", "ale", "lem", "emu", "mul", "ul "
๐ Why the spaces?
- PostgreSQL pads the string at the beginning and end with spaces to catch edge cases.
Now, when comparing two strings (Talemul vs Talimul), PostgreSQL compares how many trigrams they share:
-
Talemul:[" T","Tal","ale","lem","emu","mul","ul "] -
Talimul:[" T","Tal","ali","lim","imu","mul","ul "]
They share trigrams like [" T","Tal","mul","ul "].
More shared trigrams = higher similarity score.
This makes trigram search excellent for catching typos or small spelling differences.
โ Setup
Enable the extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Check installed extensions:
\dx
๐ Example: Finding Similar Names
Letโs say we have a table of names:
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT
);
INSERT INTO people (name) VALUES
('Talemul'),
('Talimul'),
('Talimul Islam'),
('Tamim'),
('Talim');
Now, search for names similar to "Talemul":
SELECT name, similarity(name, 'Talemul') AS score
FROM people
WHERE name % 'Talemul' -- % means "is similar to"
ORDER BY score DESC;
๐ฅ Actual Output:
name | score
----------------+-------------
Talemul | 1.00000000
Talimul | 0.45454547
Talimul Islam | 0.29411766
Talim | 0.27272728
๐ Notice how "Talimul Islam" and "Talim" get lower scores, and may even be excluded depending on the similarity threshold.
โก Boost Performance with Indexes
For large datasets, use a GIN index to speed up similarity searches:
CREATE INDEX idx_people_name_trgm ON people USING gin (name gin_trgm_ops);
Now your searches with % and similarity() will scale much better.
๐ฏ Real-World Use Cases
-
User search: Find users by name despite typos (
JonvsJohn). - Autocomplete: Suggest terms even when slightly misspelled.
- Deduplication: Detect near-duplicate entries.
- Search engines: Power โDid you mean?โ features.
๐ Wrap Up
Trigrams are a simple but powerful idea: break words into three-character chunks, compare overlaps, and compute similarity.
PostgreSQLโs pg_trgm extension brings this technique directly into SQL, letting you build typo-tolerant search features without needing an external search engine.
๐ References:
Top comments (0)