Renan Maringolo

Posted on Dec 17

Construindo Busca Completa no PostgresSQL / Building Complete Search in PostgresSQL

#postgres #fulltex #similarity #sql

PT-BR:

A busca de texto completo (FTS) no PostgreSQL é poderosa, rápida e fácil de configurar, eliminando a necessidade de ferramentas externas como Elasticsearch em muitos cenários. Inclusive, nunca havia feito essa funcionalidade e inicialmente pensei em usar ES e até questionei colegas a respeito. Só que o problema do ES é que pode ser "too much" e caro ao que você precisa!

Passo 1: Criando a tabela e populando com dados

Primeiro, crio uma tabela simples e adiciono dados para realizar nossas buscas: Artigos com colunas id, titulo e descrição!

CREATE TABLE artigos (
  id SERIAL PRIMARY KEY,
  titulo VARCHAR(80) NOT NULL,
  descricao VARCHAR(200) NOT NULL
);

INSERT INTO artigos (titulo, descricao) VALUES
  ('Aprenda PostgreSQL', 'Um guia completo sobre PostgreSQL'),
  ('Introdução ao SQL', 'Entenda os fundamentos de SQL'),
  ('Busca de texto no Postgres', 'Como implementar busca de texto no PostgreSQL'),
  ('Programação em Java', 'Um curso básico de Java'),
  ('Desenvolvimento web com Rails', 'Aprenda a criar aplicações com Rails'),
  ('Curso avançado de SQL', 'Domine SQL com exemplos práticos'),
  ('Aprendizado de Máquina', 'Introdução ao Machine Learning');

Executando a query para verificar:

SELECT * FROM artigos;

Passo 2: Implementando a busca simples com LIKE

Uma busca simples em SQL usa LIKE ou ILIKE (case-insensitive):

SELECT id, titulo, descricao
FROM artigos
WHERE titulo ILIKE '%PostgreSQL%' OR descricao ILIKE '%PostgreSQL%';

Problemas com o LIKE:

É case-sensitive (exceto ILIKE).
Não oferece relevância ou ranqueamento.
Torna-se lento com grandes volumes de dados, pois não há índice.

Passo 3: Configurando a busca de texto completo (FTS)

Como falei no início em relação ao ES, o PostgreSQL possui recursos nativos para busca de texto completo. Aqui estão os dois principais:

to_tsvector: Converte um texto em um vetor de busca (documento pesquisável).
to_tsquery: Converte um texto em uma query para busca.

Exemplo básico:

SELECT * 
FROM artigos
WHERE to_tsvector('portuguese', titulo || ' ' || descricao) 
      @@ to_tsquery('portuguese', 'Rails');

Aqui, buscamos pela palavra Rails nos títulos e descrições.

Passo 4: Adicionando relevância com ts_rank

O ts_rank calcula a relevância da busca, permitindo ranquear os resultados.

SELECT id, titulo, descricao,
       ts_rank(to_tsvector('portuguese', titulo), to_tsquery('portuguese', 'PostgreSQL')) AS rank_titulo,
       ts_rank(to_tsvector('portuguese', descricao), to_tsquery('portuguese', 'PostgreSQL')) AS rank_descricao
FROM artigos
WHERE to_tsvector('portuguese', titulo || ' ' || descricao) 
      @@ to_tsquery('portuguese', 'PostgreSQL')
ORDER BY rank_titulo DESC, rank_descricao DESC;

Passo 5: Usando GIN e sendo feliz com sua performance!

Adicionando um índice GIN acelera a busca significativamente:

CREATE INDEX idx_artigos_fts 
ON artigos USING GIN (to_tsvector('portuguese', titulo || ' ' || descricao));

Agora a busca utiliza o índice GIN para maior performance!

Passo 6: Adicionando busca aproximada com pg_trgm

Para permitir buscas com erros de digitação (reann), usamos a extensão pg_trgm e a função SIMILARITY:

Habilite a extensão:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Combine busca FTS com busca aproximada:

SELECT id, titulo, descricao,
       ts_rank(to_tsvector('portuguese', titulo), to_tsquery('portuguese', 'PostgreSQL')) AS rank_titulo,
       similarity('PostgreSQL', titulo || ' ' || descricao) AS similaridade
FROM artigos
WHERE to_tsvector('portuguese', titulo || ' ' || descricao) 
      @@ to_tsquery('portuguese', 'PostgreSQL')
   OR similarity('PostgreSQL', titulo || ' ' || descricao) > 0.2
ORDER BY rank_titulo DESC, similaridade DESC;

O resultado final: Busca de texto completo com ranking e similaridade

SELECT id, titulo, descricao,
       ts_rank(to_tsvector('portuguese', titulo), to_tsquery('portuguese', 'SQL')) AS rank_titulo,
       ts_rank(to_tsvector('portuguese', descricao), to_tsquery('portuguese', 'SQL')) AS rank_descricao,
       similarity('SQL', titulo || ' ' || descricao) AS similaridade
FROM artigos
WHERE to_tsvector('portuguese', titulo || ' ' || descricao) 
      @@ to_tsquery('portuguese', 'SQL')
   OR similarity('SQL', titulo || ' ' || descricao) > 0.2
ORDER BY rank_titulo DESC, rank_descricao DESC, similaridade DESC;

Conclusão

Antes de colocar inúmeras ferramentas para resolver problema de Busca Completa, tente utilizar isso na sua aplicação, pois na maioria esmagadora das vezes resolverá os problemas. Com o uso de índices GIN e a extensão pg_trgm, conseguimos um sistema de busca rápido e eficiente. 🚀

fontes:

EN:

Full-text search (FTS) in PostgreSQL is powerful, fast, and easy to configure, eliminating the need for external tools like Elasticsearch in many scenarios. In fact, I had never implemented this functionality before and initially thought about using ES and even asked colleagues about it. But the problem with ES is that it can be "too much" and expensive for what you need!

Step 1: Creating the table and populating with data

First, I create a simple table and add data to perform our searches: Articles with columns id, title, and description!

CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  title VARCHAR(80) NOT NULL,
  description VARCHAR(200) NOT NULL
);

INSERT INTO articles (title, description) VALUES
  ('Learn PostgreSQL', 'A complete guide on PostgreSQL'),
  ('Introduction to SQL', 'Understand the fundamentals of SQL'),
  ('Text search in Postgres', 'How to implement text search in PostgreSQL'),
  ('Programming in Java', 'A basic Java course'),
  ('Web development with Rails', 'Learn to create applications with Rails'),
  ('Advanced SQL Course', 'Master SQL with practical examples'),
  ('Machine Learning', 'Introduction to Machine Learning');

Running the query to verify:

SELECT * FROM articles;

Step 2: Implementing a simple search with LIKE

A simple search in SQL uses LIKE or ILIKE (case-insensitive):

SELECT id, title, description
FROM articles
WHERE title ILIKE '%PostgreSQL%' OR description ILIKE '%PostgreSQL%';

Problems with LIKE:

It is case-sensitive (except ILIKE).
It does not offer relevance or ranking.
It becomes slow with large volumes of data since there is no index.

Step 3: Configuring full-text search (FTS)

As I mentioned earlier about ES, PostgreSQL has native features for full-text search. Here are the two main ones:

to_tsvector: Converts a text into a search vector (searchable document).
to_tsquery: Converts a text into a search query.

Example:

SELECT * 
FROM articles
WHERE to_tsvector('english', title || ' ' || description) 
      @@ to_tsquery('english', 'Rails');

Here, we search for the word "Rails" in titles and descriptions.

Step 4: Adding relevance with ts_rank

ts_rank calculates the search relevance, allowing us to rank the results.

SELECT id, title, description,
       ts_rank(to_tsvector('english', title), to_tsquery('english', 'PostgreSQL')) AS rank_title,
       ts_rank(to_tsvector('english', description), to_tsquery('english', 'PostgreSQL')) AS rank_description
FROM articles
WHERE to_tsvector('english', title || ' ' || description) 
      @@ to_tsquery('english', 'PostgreSQL')
ORDER BY rank_title DESC, rank_description DESC;

Step 5: Using GIN and being happy with its performance!

Adding a GIN index significantly speeds up the search:

CREATE INDEX idx_articles_fts 
ON articles USING GIN (to_tsvector('english', title || ' ' || description));

Now the search uses the GIN index for better performance!

Step 6: Adding approximate search with pg_trgm

To allow searches with typos (reann), we use the pg_trgm extension and the SIMILARITY function:

Enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Combine FTS with approximate search:

SELECT id, title, description,
       ts_rank(to_tsvector('english', title), to_tsquery('english', 'PostgreSQL')) AS rank_title,
       similarity('PostgreSQL', title || ' ' || description) AS similarity
FROM articles
WHERE to_tsvector('english', title || ' ' || description) 
      @@ to_tsquery('english', 'PostgreSQL')
   OR similarity('PostgreSQL', title || ' ' || description) > 0.2
ORDER BY rank_title DESC, similarity DESC;

Result: Full-text search with ranking and similarity

SELECT id, title, description,
       ts_rank(to_tsvector('english', title), to_tsquery('english', 'SQL')) AS rank_title,
       ts_rank(to_tsvector('english', description), to_tsquery('english', 'SQL')) AS rank_description,
       similarity('SQL', title || ' ' || description) AS similarity
FROM articles
WHERE to_tsvector('english', title || ' ' || description) 
      @@ to_tsquery('english', 'SQL')
   OR similarity('SQL', title || ' ' || description) > 0.2
ORDER BY rank_title DESC, rank_description DESC, similarity DESC;

Conclusion

Before introducing countless tools to solve the full-text search problem, try using this in your application, as in the overwhelming majority of cases, it will solve the issues. With the use of GIN indexes and the pg_trgm extension, we achieve a fast and efficient search system. 🚀

Sources:

DEV Community