๐ Fuzzy String Matching in PostgreSQL with pg_trgm
When working with user data โ names, addresses, search queries โ exact string matching isnโt always enough. People make typos, use alternate spellings, or abbreviations.
PostgreSQL provides a powerful extension called pg_trgm
(trigram search) that allows you to perform fuzzy text matching and find โclose enoughโ results.
๐งฉ What is a Trigram?
A trigram is simply a group of three consecutive characters from a string.
For example, the word:
Talemul
Would be split into trigrams (with start/end markers):
" T", "Tal", "ale", "lem", "emu", "mul", "ul "
๐ Why the spaces?
- PostgreSQL pads the string at the beginning and end with spaces to catch edge cases.
Now, when comparing two strings (Talemul
vs Talimul
), PostgreSQL compares how many trigrams they share:
-
Talemul
:[" T","Tal","ale","lem","emu","mul","ul "]
-
Talimul
:[" T","Tal","ali","lim","imu","mul","ul "]
They share trigrams like [" T","Tal","mul","ul "]
.
More shared trigrams = higher similarity score.
This makes trigram search excellent for catching typos or small spelling differences.
โ Setup
Enable the extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Check installed extensions:
\dx
๐ Example: Finding Similar Names
Letโs say we have a table of names:
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT
);
INSERT INTO people (name) VALUES
('Talemul'),
('Talimul'),
('Talimul Islam'),
('Tamim'),
('Talim');
Now, search for names similar to "Talemul"
:
SELECT name, similarity(name, 'Talemul') AS score
FROM people
WHERE name % 'Talemul' -- % means "is similar to"
ORDER BY score DESC;
๐ฅ Actual Output:
name | score
----------------+-------------
Talemul | 1.00000000
Talimul | 0.45454547
Talimul Islam | 0.29411766
Talim | 0.27272728
๐ Notice how "Talimul Islam"
and "Talim"
get lower scores, and may even be excluded depending on the similarity threshold.
โก Boost Performance with Indexes
For large datasets, use a GIN index to speed up similarity searches:
CREATE INDEX idx_people_name_trgm ON people USING gin (name gin_trgm_ops);
Now your searches with %
and similarity()
will scale much better.
๐ฏ Real-World Use Cases
-
User search: Find users by name despite typos (
Jon
vsJohn
). - Autocomplete: Suggest terms even when slightly misspelled.
- Deduplication: Detect near-duplicate entries.
- Search engines: Power โDid you mean?โ features.
๐ Wrap Up
Trigrams are a simple but powerful idea: break words into three-character chunks, compare overlaps, and compute similarity.
PostgreSQLโs pg_trgm
extension brings this technique directly into SQL, letting you build typo-tolerant search features without needing an external search engine.
๐ References:
Top comments (0)