DEV Community

Talemul Islam
Talemul Islam

Posted on

Fuzzy String Matching in PostgreSQL with pg_trgm (Trigram Search Tutorial)

๐Ÿ” Fuzzy String Matching in PostgreSQL with pg_trgm

When working with user data โ€” names, addresses, search queries โ€” exact string matching isnโ€™t always enough. People make typos, use alternate spellings, or abbreviations.

PostgreSQL provides a powerful extension called pg_trgm (trigram search) that allows you to perform fuzzy text matching and find โ€œclose enoughโ€ results.


๐Ÿงฉ What is a Trigram?

A trigram is simply a group of three consecutive characters from a string.

For example, the word:

Talemul
Enter fullscreen mode Exit fullscreen mode

Would be split into trigrams (with start/end markers):

"  T", "Tal", "ale", "lem", "emu", "mul", "ul "
Enter fullscreen mode Exit fullscreen mode

๐Ÿ‘‰ Why the spaces?

  • PostgreSQL pads the string at the beginning and end with spaces to catch edge cases.

Now, when comparing two strings (Talemul vs Talimul), PostgreSQL compares how many trigrams they share:

  • Talemul: [" T","Tal","ale","lem","emu","mul","ul "]
  • Talimul: [" T","Tal","ali","lim","imu","mul","ul "]

They share trigrams like [" T","Tal","mul","ul "].

More shared trigrams = higher similarity score.

This makes trigram search excellent for catching typos or small spelling differences.


โœ… Setup

Enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
Enter fullscreen mode Exit fullscreen mode

Check installed extensions:

\dx
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Š Example: Finding Similar Names

Letโ€™s say we have a table of names:

CREATE TABLE people (
    id SERIAL PRIMARY KEY,
    name TEXT
);

INSERT INTO people (name) VALUES
('Talemul'),
('Talimul'),
('Talimul Islam'),
('Tamim'),
('Talim');
Enter fullscreen mode Exit fullscreen mode

Now, search for names similar to "Talemul":

SELECT name, similarity(name, 'Talemul') AS score
FROM people
WHERE name % 'Talemul'   -- % means "is similar to"
ORDER BY score DESC;
Enter fullscreen mode Exit fullscreen mode

๐Ÿ–ฅ Actual Output:

     name       |    score
----------------+-------------
 Talemul        | 1.00000000
 Talimul        | 0.45454547
 Talimul Islam  | 0.29411766
 Talim          | 0.27272728
Enter fullscreen mode Exit fullscreen mode

๐Ÿ‘‰ Notice how "Talimul Islam" and "Talim" get lower scores, and may even be excluded depending on the similarity threshold.


โšก Boost Performance with Indexes

For large datasets, use a GIN index to speed up similarity searches:

CREATE INDEX idx_people_name_trgm ON people USING gin (name gin_trgm_ops);
Enter fullscreen mode Exit fullscreen mode

Now your searches with % and similarity() will scale much better.


๐ŸŽฏ Real-World Use Cases

  • User search: Find users by name despite typos (Jon vs John).
  • Autocomplete: Suggest terms even when slightly misspelled.
  • Deduplication: Detect near-duplicate entries.
  • Search engines: Power โ€œDid you mean?โ€ features.

๐Ÿ“ Wrap Up

Trigrams are a simple but powerful idea: break words into three-character chunks, compare overlaps, and compute similarity.

PostgreSQLโ€™s pg_trgm extension brings this technique directly into SQL, letting you build typo-tolerant search features without needing an external search engine.

๐Ÿ”— References:

Top comments (0)