DEV Community 👩‍💻👨‍💻

Cover image for PostgreSQL Full Text Search
t-suarez
t-suarez

Posted on

PostgreSQL Full Text Search

How does the PostgreSQL Full Text Search function work?

In our recent projects, my team and I have been using PostgreSQL, a feature-rich database management system. One of the features we've utilized is a Full Text Search, which can be implemented natively using PostgreSQL's built-in text search.

This Full Text Search is a bit different from basic text matching. Using PostgreSQL's native feature, we can achieve a more loose text matching than we could if we were to use something like JavaScript's native Array.filter() method. this works because the text that will be searchable is actually processed in such a way that makes it capable of loosely matching roots of words with different conjugations of the same word.

In this way described above, PostgreSQL tokenizes the searchable terms into "lexemes". According to Wikipedia:

"[In computer science, a] lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token."

PostgreSQL terms this process of breaking down words into token lexemes as "vectorizing" the word. PostgreSQL's native feature for doing this task is called TSVector (TS stands for "text search").

We can create and add a TSVector from any table with English words (non-English words would need to have their language specified later).

ALTER TABLE "Items" ADD COLUMN "Items_search" TSVECTOR
Enter fullscreen mode Exit fullscreen mode

Above, we are adding a new column to table "Items" which is a column of TSVector type, now we have only created the column and must populate it with the tokenized search terms. We will do that in the below command:

UPDATE "Items" SET "Items_search" = to_tsvector('english', ' /* words || joined || like || this */ ');
Enter fullscreen mode Exit fullscreen mode

In the above command, we've used the built-in to_tsvector() function to pass in our language and also the words within our "Items" table which must be searchable. In this example the words are spelled out explicitly, separated with logical ORs ( '||' ), but in a real life situation this would be done programmatically. Next we will do further processing and tokenizing using PostgreSQL GIN indexer:

CREATE INDEX "Items_search" ON "Items" USING gin('Items_search');
Enter fullscreen mode Exit fullscreen mode

The GIN indexer only stores unweighted lexemes. PostgreSQL has other indexer methods, like GiST, that perform better than GIN indexing for given situations. From the PostgreSQL docs:

"As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update."

We now have a searchable column on our "Items" table called "Items_search". Now we can use another PostgreSQL method called TSQuery to do a weighted, full text search. This is done using the PostgreSQL basic text matching operator: @@.

SELECT * FROM "Items"
WHERE "Items_search" @@ plainto_tsquery('english', '/* text to search */');
Enter fullscreen mode Exit fullscreen mode

Once again, this case has a commented-out sample argument. In a real life scenario this search would be implemented programmatically.

And with that we have a functioning search capability in our PostgreSQL database. This is extremely useful and not too difficult to implement and comes with less overhead than something like Elasticsearch would require. Reflecting on our team's decision to utilize PostgreSQL on our most recent project, I've come to appreciate that PostgreSQL has a lot more to offer "out of the box" than other DMSs. I hope you enjoyed learning about this PostgreSQL feature, in my next blog I will talk about another native PSQL feature: implementing Listener/Notify event handlers! See you then.

https://www.postgresql.org/docs/9.1/textsearch-indexes.html

https://en.wikipedia.org/wiki/Lexical_analysis

https://www.postgresql.org/docs/9.5/textsearch-intro.html#TEXTSEARCH-DOCUMENT

Top comments (0)

Timeless DEV post...

Git Concepts I Wish I Knew Years Ago

The most used technology by developers is not Javascript.

It's not Python or HTML.

It hardly even gets mentioned in interviews or listed as a pre-requisite for jobs.

I'm talking about Git and version control of course.

One does not simply learn git