<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gabriel Brocco de Oliveira</title>
    <description>The latest articles on DEV Community by Gabriel Brocco de Oliveira (@gaabrielbrocco).</description>
    <link>https://dev.to/gaabrielbrocco</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2093232%2F088f38a8-7140-4e85-ac41-2759e9f349a8.JPG</url>
      <title>DEV Community: Gabriel Brocco de Oliveira</title>
      <link>https://dev.to/gaabrielbrocco</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gaabrielbrocco"/>
    <language>en</language>
    <item>
      <title>Como treinei uma IA de suporte com histórico real de atendimento: da conversa bruta ao RAG em produção</title>
      <dc:creator>Gabriel Brocco de Oliveira</dc:creator>
      <pubDate>Thu, 21 May 2026 22:49:36 +0000</pubDate>
      <link>https://dev.to/gaabrielbrocco/como-treinei-uma-ia-de-suporte-com-historico-real-de-atendimento-da-conversa-bruta-ao-rag-em-2i1l</link>
      <guid>https://dev.to/gaabrielbrocco/como-treinei-uma-ia-de-suporte-com-historico-real-de-atendimento-da-conversa-bruta-ao-rag-em-2i1l</guid>
      <description>&lt;p&gt;Esse artigo é a documentação completa do pipeline que construí para extrair conhecimento do histórico real de atendimento de um cliente e transformá-lo em base vetorial para uma IA de suporte em produção.&lt;/p&gt;

&lt;p&gt;A linha do tempo: 8.400 conversas brutas viraram 2.200 pares de conhecimento na base final. Sem anotação manual.&lt;/p&gt;




&lt;h2&gt;
  
  
  Antes de começar: os conceitos
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LLM (Large Language Model).&lt;/strong&gt; O modelo de linguagem em si, como Claude, GPT ou Gemini. Ele é poderoso, mas tem dois problemas: não sabe nada sobre o seu negócio específico, e alucina quando não sabe.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG (Retrieval-Augmented Generation).&lt;/strong&gt; A solução para isso. Antes do LLM responder, você &lt;em&gt;busca&lt;/em&gt; o contexto relevante numa base própria e &lt;em&gt;injeta&lt;/em&gt; esse contexto no prompt. O modelo deixa de adivinhar e passa a responder a partir de informação real e verificável.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding.&lt;/strong&gt; A peça que faz a busca funcionar. É um vetor, ou seja, uma lista de números (no nosso caso, 1.536 deles) que representa o "significado" de um texto num espaço matemático. Frases parecidas geram vetores próximos. "Como gerar relatório?" e "Onde vejo o resumo financeiro?" ficam vizinhas no espaço vetorial mesmo sem dividir palavras.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vetor store.&lt;/strong&gt; O lugar onde esses vetores ficam armazenados de forma que você consegue perguntar "me dá os 10 mais parecidos com esse aqui" em milissegundos. No nosso caso, Postgres com a extensão &lt;code&gt;pgvector&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Busca semântica vs full-text.&lt;/strong&gt; Semântica é via embedding: captura sentido. Full-text é match de palavras: captura termos exatos. Cada uma falha em coisas diferentes. A combinação é o que chamam de &lt;strong&gt;busca híbrida&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reranker.&lt;/strong&gt; Um modelo menor que, depois da busca, reordena os resultados pela relevância real para a query. A busca é boa em achar candidatos. O reranker é bom em decidir a ordem final.&lt;/p&gt;




&lt;h2&gt;
  
  
  O problema concreto
&lt;/h2&gt;

&lt;p&gt;Plataforma de atendimento ao cliente customizada (Postgres por baixo), meses de histórico de conversas. Duas fontes de conhecimento precisavam alimentar o agente de IA:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;*&lt;em&gt;Documentação técnica *&lt;/em&gt; — já processada num pipeline de RAG separado, com chunking hierárquico por seção&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Histórico de conversas&lt;/strong&gt; — o desafio desse artigo&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;O objetivo nunca foi substituir atendentes humanos. Era criar uma primeira camada de resolução automática para as dúvidas recorrentes, que representam cerca de &lt;strong&gt;70% do volume&lt;/strong&gt; da operação.&lt;/p&gt;




&lt;h2&gt;
  
  
  Visão geral: o pipeline em 5 estágios
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] Coleta + filtro de qualidade
        ↓
[2] Classificação estruturada com LLM
        ↓
[3] Geração de embeddings
        ↓
[4] Deduplicação vetorial (greedy clustering)
        ↓
[5] Busca híbrida (full-text search com ts_rank + semântica) + reranker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cada estágio resolve um problema específico que o anterior não resolve. Vou destrinchar um por um.&lt;/p&gt;




&lt;h2&gt;
  
  
  Estágio 1 — Coleta e filtro de qualidade
&lt;/h2&gt;

&lt;p&gt;O maior obstáculo de aprender com histórico real é o &lt;strong&gt;ruído&lt;/strong&gt;. Nem toda conversa terminou em resolução. Muitas foram escaladas. Algumas o cliente ficou insatisfeito e nem disse nada. Se você joga tudo na base, ensina o agente a errar do mesmo jeito que o pior atendente do time.&lt;/p&gt;

&lt;p&gt;A pergunta é: como filtrar qualidade sem precisar anotar manualmente milhares de conversas?&lt;/p&gt;

&lt;p&gt;A resposta pragmática: &lt;strong&gt;CSAT (Pontuação de Satisfação do Cliente) como proxy de qualidade&lt;/strong&gt;. Não é perfeito, já que nem todo bom atendimento recebe avaliação, mas é o sinal mais confiável que já está no banco, de graça.&lt;/p&gt;

&lt;p&gt;Além do CSAT, dois filtros adicionais:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Número mínimo de mensagens:&lt;/strong&gt; conversas muito curtas raramente contêm resolução real. Mínimo de 4 mensagens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Status &lt;code&gt;resolved&lt;/code&gt;:&lt;/strong&gt; só conversas que a plataforma marcou como resolvidas.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resultado: de ~8.400 conversas brutas, sobraram ~2.100 qualificadas.&lt;/p&gt;




&lt;h2&gt;
  
  
  Estágio 2 — Classificação estruturada com LLM
&lt;/h2&gt;

&lt;p&gt;Essa é a parte central do pipeline. Cada conversa qualificada é enviada para um LLM com um prompt de extração estruturada. O modelo recebe o histórico completo de mensagens e devolve JSON com um ou mais &lt;code&gt;qa_pairs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Por que LLM e não regras?&lt;/strong&gt; Porque um único ticket pode conter múltiplos problemas diferentes. O cliente começa perguntando sobre relatório, no meio reclama de uma cobrança, no fim pergunta sobre integração. Três tópicos, três pares de conhecimento independentes. Regras de regex não resolvem isso de forma confiável. O LLM identifica os tópicos, separa em pares e classifica cada um individualmente.&lt;/p&gt;

&lt;p&gt;Modelo escolhido: &lt;strong&gt;Claude Haiku&lt;/strong&gt;. Rápido e barato o suficiente para processar 2.100 conversas em batch sem virar projeto orçamentário.&lt;/p&gt;

&lt;h3&gt;
  
  
  A anatomia de um qa_pair
&lt;/h3&gt;

&lt;p&gt;Cada par tem campos com função muito específica:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Campo&lt;/th&gt;
&lt;th&gt;Função&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;question&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pergunta essencial reescrita, sem nome de cliente e sem dados do caso específico&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;answer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Solução em prosa, anonimizada. Não é transcrição, é síntese do que resolveu&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;resolution_steps&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Array de passos ordenados. O agente usa pra guiar o cliente step-by-step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;domain&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Classificação de alto nível &lt;strong&gt;fechada&lt;/strong&gt;, que define qual setor resolveria&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;module&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Área do produto em snake_case &lt;strong&gt;livre&lt;/strong&gt; (ex: &lt;code&gt;financeiro_app&lt;/code&gt;, &lt;code&gt;agenda&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;intent_freetext&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Intent em snake_case (ex: &lt;code&gt;gerar_relatorio_mensal&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;resolution_actor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Quem resolve: &lt;code&gt;client_self_service&lt;/code&gt; / &lt;code&gt;agent_in_app&lt;/code&gt; / &lt;code&gt;agent_backend&lt;/code&gt; / &lt;code&gt;external_team&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user_confirmed_resolution&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;O cliente confirmou que foi resolvido? Boolean&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;confirmation_type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;explicit&lt;/code&gt; / &lt;code&gt;implicit_no_return&lt;/code&gt; / &lt;code&gt;informational_only&lt;/code&gt; / &lt;code&gt;none&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;confidence&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.0 a 1.0. Certeza do LLM na extração. Governa o que entra na base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pii_found&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Boolean. Havia dados pessoais identificáveis na conversa?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pii_audit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Detalhamento do PII encontrado e como foi anonimizado&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Um exemplo real (anonimizado) do output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"qa_pairs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Como gerar relatório de entradas separado por centro de custo?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Acesse Financeiro &amp;gt; Demonstrativo Financeiro. Selecione o período e o centro de custo desejado. O sistema permite filtros por forma de entrada e separação entre recebimentos e pagamentos. Exportação disponível em PDF."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"resolution_steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Acessar menu Financeiro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Selecionar Demonstrativo Financeiro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Configurar filtro de período e centro de custo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Aplicar filtros adicionais se necessário"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Exportar em PDF"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"domain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tecnico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"module"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"financeiro_app"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"intent_freetext"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gerar_relatorio_por_centro_de_custo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"topic_tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"relatorio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"financeiro"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"centro_de_custo"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"resolution_actor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"client_self_service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user_confirmed_resolution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confirmation_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"explicit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pii_found"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pii_audit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"should_extract"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"quality_concern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  O sistema de domínio em dois níveis
&lt;/h3&gt;

&lt;p&gt;Essa escolha de arquitetura é deliberada e resolve um problema real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;domain&lt;/code&gt; é fechado.&lt;/strong&gt; Só pode assumir valores pré-definidos: &lt;code&gt;tecnico&lt;/code&gt;, &lt;code&gt;comercial&lt;/code&gt;, &lt;code&gt;assinatura&lt;/code&gt;, &lt;code&gt;nfse&lt;/code&gt;. Serve pra roteamento e filtro grosso na busca.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;module&lt;/code&gt; é livre em snake_case.&lt;/strong&gt; Captura granularidade do produto. O agente de IA usa para filtrar retrieval por área funcional sem depender de uma taxonomia rígida que precisa ser mantida.&lt;/p&gt;

&lt;p&gt;O maior erro inicial foi misturar "dúvida sobre o módulo financeiro do app" com "assinatura do serviço". O módulo financeiro &lt;strong&gt;do produto&lt;/strong&gt; é técnico, porque o cliente aprendeu a usar uma feature. Assinatura é sobre o que &lt;strong&gt;ele paga pra empresa&lt;/strong&gt;. Domínios separados, lógicas completamente diferentes, atendentes diferentes.&lt;/p&gt;

&lt;h3&gt;
  
  
  O papel do confidence na qualidade da base
&lt;/h3&gt;

&lt;p&gt;O &lt;code&gt;confidence&lt;/code&gt; não é decoração. Ele governa o que entra na base final.&lt;/p&gt;

&lt;p&gt;O LLM atribui confidence baixo quando: a conversa não tinha resolução clara, o cliente saiu sem confirmar, a dúvida era ambígua entre dois domínios, ou havia informação contraditória no histórico.&lt;/p&gt;

&lt;p&gt;Threshold estabelecido: &lt;strong&gt;0.75&lt;/strong&gt;. Pares abaixo disso vão pra fila de revisão manual. Não são descartados automaticamente, mas também não entram direto na base.&lt;/p&gt;

&lt;p&gt;Distribuição real após classificação:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;confidence ≥ 0.90&lt;/code&gt; → &lt;strong&gt;~43%&lt;/strong&gt; dos pares&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.75 ≤ confidence &amp;lt; 0.90&lt;/code&gt; → &lt;strong&gt;~38%&lt;/strong&gt; dos pares&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;confidence &amp;lt; 0.75&lt;/code&gt; → &lt;strong&gt;~19%&lt;/strong&gt; (fila de revisão)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;~80% do conhecimento aproveitado com qualidade alta, sem anotação humana.&lt;/p&gt;

&lt;h3&gt;
  
  
  O schema que sustenta o pipeline
&lt;/h3&gt;

&lt;p&gt;Todo esse JSON do LLM precisa virar linhas numa tabela. Toda a engenharia dos próximos estágios (embeddings, deduplicação, busca híbrida) depende de como esse schema é desenhado. Aqui está a tabela de staging que sustenta o pipeline inteiro:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;staging_id&lt;/span&gt;                &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;source_conversation_id&lt;/span&gt;    &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extracted_at&lt;/span&gt;              &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Conteúdo extraído do LLM&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;                  &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt;                    &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resolution_steps&lt;/span&gt;          &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;-- Classificação&lt;/span&gt;
    &lt;span class="k"&gt;domain&lt;/span&gt;                    &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                                &lt;span class="k"&gt;domain&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tecnico'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'comercial'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'assinatura'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'nfse'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                              &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;module&lt;/span&gt;                    &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;intent_freetext&lt;/span&gt;           &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;topic_tags&lt;/span&gt;                &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="n"&gt;resolution_actor&lt;/span&gt;          &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                                &lt;span class="n"&gt;resolution_actor&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                                  &lt;span class="s1"&gt;'client_self_service'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'agent_in_app'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="s1"&gt;'agent_backend'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'external_team'&lt;/span&gt;
                                &lt;span class="p"&gt;)&lt;/span&gt;
                              &lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Sinais de qualidade&lt;/span&gt;
    &lt;span class="n"&gt;user_confirmed_resolution&lt;/span&gt; &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;confirmation_type&lt;/span&gt;         &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;                &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Privacidade&lt;/span&gt;
    &lt;span class="n"&gt;pii_found&lt;/span&gt;                 &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pii_audit&lt;/span&gt;                 &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;-- Embeddings (preenchidos no estágio 3)&lt;/span&gt;
    &lt;span class="n"&gt;embedding_input&lt;/span&gt;           &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;                 &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;answer_embedding&lt;/span&gt;          &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Controle de deduplicação (preenchido no estágio 4)&lt;/span&gt;
    &lt;span class="n"&gt;dedup_status&lt;/span&gt;              &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                                &lt;span class="n"&gt;dedup_status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'canonical'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'duplicate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'unique'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                              &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;duplicate_of&lt;/span&gt;              &lt;span class="nb"&gt;BIGINT&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;staging_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;dup_similarity&lt;/span&gt;            &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="c1"&gt;-- Coluna full-text (estágio 5)&lt;/span&gt;
    &lt;span class="n"&gt;fts&lt;/span&gt;                       &lt;span class="n"&gt;tsvector&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                                &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                                  &lt;span class="s1"&gt;'portuguese'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                  &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                                &lt;span class="p"&gt;)&lt;/span&gt;
                              &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;STORED&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Índice ANN (Approximate Nearest Neighbor) para busca semântica&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_staging_embedding&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Índice GIN para full-text search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_staging_fts&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;gin&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Índices de filtro&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_staging_domain&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_staging_dedup&lt;/span&gt;  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dedup_status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Três decisões:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;O &lt;code&gt;embedding&lt;/code&gt; é &lt;code&gt;vector(1536)&lt;/code&gt;.&lt;/strong&gt; Essa dimensão vem do modelo &lt;code&gt;text-embedding-3-small&lt;/code&gt; da OpenAI. Se trocar de modelo, a dimensão muda. Não dá pra "só rodar o novo modelo" sem reconstruir a coluna e rebuildar o índice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;O índice é HNSW (Hierarchical Navigable Small World).&lt;/strong&gt; É um algoritmo de busca aproximada que sacrifica precisão mínima por velocidade gigante. Com volume maior, HNSW é o que viabiliza retrieval em milissegundos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;O &lt;code&gt;fts&lt;/code&gt; é uma coluna &lt;code&gt;GENERATED ALWAYS AS&lt;/code&gt;.&lt;/strong&gt; O Postgres gera e mantém atualizado o &lt;code&gt;tsvector&lt;/code&gt; automaticamente baseado em &lt;code&gt;question + answer&lt;/code&gt;, com stemming em português. Combinado com o índice GIN, é o que faz o full-text search do estágio 5 ser instantâneo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optamos por &lt;code&gt;ts_rank&lt;/code&gt; nativo em vez de BM25 por dois motivos:&lt;/strong&gt; escala atual de ~2k pares não justifica a complexidade adicional, e o ganho marginal de recall não compensa o overhead operacional.&lt;/p&gt;

&lt;p&gt;A tabela de produção (que o agente realmente consulta) é uma view filtrada dessa staging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;dedup_status&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'canonical'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'unique'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Essa separação entre staging e produção é o que permite reprocessar o pipeline (reclassificar, recalibrar threshold, regerar embeddings) sem mexer no que o agente está consultando em tempo real.&lt;/p&gt;




&lt;h2&gt;
  
  
  Estágio 3 — Geração de embeddings
&lt;/h2&gt;

&lt;p&gt;Aqui tem um detalhe sutil que faz diferença grande.&lt;/p&gt;

&lt;p&gt;O texto que vai para o modelo de embedding &lt;strong&gt;não é só o &lt;code&gt;question&lt;/code&gt;&lt;/strong&gt;. É a concatenação de &lt;code&gt;question&lt;/code&gt; + &lt;code&gt;intent_freetext&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Por quê? Porque perguntas reais são genéricas. "Como faço isso?", "Não está funcionando", "Tá dando erro". O embedding dessas frases sozinhas é fraco, porque vetorialmente elas ficam todas amontoadas numa região do espaço.&lt;/p&gt;

&lt;p&gt;O &lt;code&gt;intent_freetext&lt;/code&gt;, que o LLM gerou em snake_case durante a classificação, ancora a semântica. "como_faço_isso" combinado com "gerar_relatorio_por_centro_de_custo" gera um vetor muito mais útil do que só "como faço isso?".&lt;/p&gt;

&lt;p&gt;O campo &lt;code&gt;answer&lt;/code&gt; recebe embedding &lt;strong&gt;separado&lt;/strong&gt;. Isso permite &lt;strong&gt;busca bidirecional&lt;/strong&gt;: pelo lado da pergunta (usuário perguntando algo) ou pelo lado da solução (agente buscando "como resolvo X").&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intent_freetext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Após geração via API (batch):&lt;/span&gt;
&lt;span class="c1"&gt;-- modelo: text-embedding-3-small&lt;/span&gt;
&lt;span class="c1"&gt;-- dimensão: 1536&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Estágio 4 — Deduplicação vetorial com greedy clustering
&lt;/h2&gt;

&lt;p&gt;2.100 conversas filtradas geraram ~3.400 qa_pairs após classificação. Mas muitos eram &lt;strong&gt;semanticamente idênticos&lt;/strong&gt;: a mesma dúvida resolvida por atendentes diferentes, em palavras diferentes, em meses diferentes.&lt;/p&gt;

&lt;p&gt;Jogar tudo na base cria um problema concreto: a busca retorna 5 versões da mesma resposta ocupando o contexto do LLM. Você desperdiça tokens e dilui a qualidade da resposta final.&lt;/p&gt;

&lt;p&gt;A solução foi aplicar &lt;strong&gt;greedy clustering por distância de cosseno&lt;/strong&gt; via &lt;code&gt;pgvector&lt;/code&gt;. A lógica é simples:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ordena todos os pares por &lt;code&gt;confidence&lt;/code&gt; decrescente&lt;/li&gt;
&lt;li&gt;O par com maior confidence vira &lt;code&gt;canonical&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Todos os pares com similaridade ≥ threshold viram &lt;code&gt;duplicate&lt;/code&gt;, apontando pro canonical&lt;/li&gt;
&lt;li&gt;O canonical carrega o melhor conhecimento. Os duplicatas ficam como referência, mas não entram na busca
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
  &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- 92% de similaridade = duplicata&lt;/span&gt;
  &lt;span class="n"&gt;pair_record&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;similar_count&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;staging_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;dedup_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;staging_id&lt;/span&gt;
  &lt;span class="n"&gt;LOOP&lt;/span&gt;
    &lt;span class="c1"&gt;-- Se já foi marcado como duplicate em iteração anterior, pula&lt;/span&gt;
    &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;dedup_status&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;staging_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;staging_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt;
      &lt;span class="k"&gt;CONTINUE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- Marca similares como duplicate desse canonical&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;similares&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
      &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;dedup_status&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'duplicate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;duplicate_of&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;staging_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="n"&gt;dup_similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;staging_id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;staging_id&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;dedup_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="n"&gt;staging_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;similar_count&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;similares&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;-- Vira canonical (tem similares) ou unique (sem similares)&lt;/span&gt;
    &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;rag_sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;case_knowledge_staging&lt;/span&gt;
    &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;dedup_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;CASE&lt;/span&gt;
      &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;similar_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'canonical'&lt;/span&gt;
      &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'unique'&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;staging_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pair_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;staging_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vale entender o operador: &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; no &lt;code&gt;pgvector&lt;/code&gt; é &lt;strong&gt;distância de cosseno&lt;/strong&gt;. Retorna 0 para vetores idênticos e 2 para opostos. &lt;code&gt;(1 - distância)&lt;/code&gt; te dá a similaridade. O threshold de 0.92 foi calibrado empiricamente. Abaixo disso, pares de fato distintos começavam a ser marcados como duplicata. Ajuste conforme a diversidade do seu domínio.&lt;/p&gt;




&lt;h2&gt;
  
  
  Estágio 5 — Busca híbrida com RRF e reranker
&lt;/h2&gt;

&lt;p&gt;A base final ficou com ~2.200 pares: só os marcados como &lt;code&gt;canonical&lt;/code&gt; ou &lt;code&gt;unique&lt;/code&gt;, com confidence ≥ 0.75. Mas a base é só metade da equação. &lt;strong&gt;Como você busca&lt;/strong&gt; importa tanto quanto o que tem dentro.&lt;/p&gt;

&lt;p&gt;A arquitetura combina dois métodos complementares:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Busca semântica.&lt;/strong&gt; Boa pra capturar intenção e variação de linguagem. "Como vejo o histórico financeiro?" e "relatório de entradas" têm embeddings próximos, mesmo sem compartilhar palavras.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full-text.&lt;/strong&gt; Bom pra termos específicos do produto, como nomes de módulos, botões e telas. "Demonstrativo Financeiro" retorna muito melhor no full-text search com ts_rank do que na busca semântica.&lt;/p&gt;

&lt;p&gt;Cada uma falha em coisas diferentes. Sozinhas, deixam buracos. Juntas, se cobrem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fundindo os rankings com RRF
&lt;/h3&gt;

&lt;p&gt;O problema de combinar duas buscas é que os scores estão em escalas incomparáveis. Score de ts_rank é uma coisa, score de cosseno é outra. Somar não faz sentido.&lt;/p&gt;

&lt;p&gt;A solução elegante é o &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt;. Em vez de combinar os scores, você combina as &lt;strong&gt;posições&lt;/strong&gt; no ranking. A fórmula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;RRF(d) = Σ  1 / (k + rank&lt;span class="p"&gt;_&lt;/span&gt;i(d))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Onde &lt;code&gt;k&lt;/code&gt; é uma constante (geralmente 60) e &lt;code&gt;rank_i(d)&lt;/code&gt; é a posição do documento &lt;code&gt;d&lt;/code&gt; no ranking do sistema &lt;code&gt;i&lt;/code&gt;. Documento que aparece no top de ambos os rankings ganha pontuação alta. Documento que só aparece num ranking ganha pontuação média. Documento que não aparece em nenhum, zero.&lt;/p&gt;

&lt;p&gt;Adaptável a diferentes escalas. Resolve o problema sem hiperparâmetro arbitrário.&lt;/p&gt;

&lt;h3&gt;
  
  
  O reranker como camada final
&lt;/h3&gt;

&lt;p&gt;Depois do RRF, ainda tem uma jogada. Uma segunda passagem com um &lt;strong&gt;modelo reranker&lt;/strong&gt; sobre os top-20 do RRF. Reordena pela relevância contextual real considerando a query completa.&lt;/p&gt;

&lt;p&gt;Por que vale o custo extra? Porque o reranker vê pergunta e candidato juntos, diferente do embedding, que codifica cada um isoladamente. A diferença entre top-5 do RRF e top-5 do reranker era perceptível nas respostas do agente em produção. Vale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Os números finais
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;8.400  conversas brutas no banco
2.100  qualificadas (CSAT + status + tamanho mínimo)
3.400  qa_pairs extraídos pelo LLM
2.200  pares na base final (após confidence + dedup)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A redução de 35% via clustering &lt;strong&gt;não perdeu cobertura&lt;/strong&gt;. Perdeu redundância. Os 1.200 pares eliminados eram semanticamente cobertos pelos que ficaram.&lt;/p&gt;




&lt;h2&gt;
  
  
  Aprendizados
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Confidence threshold em 0.75 foi o ponto de equilíbrio.&lt;/strong&gt; Testamos 0.80 e eliminava conhecimento útil. Testamos 0.70 e ruído real entrava. Esse hiperparâmetro vale o tempo de calibrar, não chute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Busca híbrida + RRF superou semântica pura&lt;/strong&gt;, especialmente em termos técnicos específicos do produto. Se a sua base tem jargão, nomes próprios ou rótulos de UI, full-text não é opcional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;resolution_actor&lt;/code&gt; virou peça de roteamento, não só metadado.&lt;/strong&gt; O agente não tenta resolver o que precisa de ação de backend, encaminha direto. Estruturar o conhecimento com esse campo desde a classificação evitou implementar uma camada de orquestração depois.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding de &lt;code&gt;question&lt;/code&gt; + &lt;code&gt;intent&lt;/code&gt; é melhor que &lt;code&gt;question&lt;/code&gt; sozinho.&lt;/strong&gt; Sutil, mas é a diferença entre busca que funciona e busca que parece funcionar nos testes e falha em produção.&lt;/p&gt;




&lt;p&gt;Se você está construindo algo parecido e tem perguntas ou sugestões sobre alguma decisão específica, me chama e vamos conversar.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Como configurar deploy automático da sua API (GO) + Postgres com Docker via GitHub Actions e SSH</title>
      <dc:creator>Gabriel Brocco de Oliveira</dc:creator>
      <pubDate>Fri, 30 May 2025 13:24:28 +0000</pubDate>
      <link>https://dev.to/gaabrielbrocco/como-configurar-deploy-automatico-da-sua-api-go-postgres-com-docker-via-github-actions-e-ssh-5a61</link>
      <guid>https://dev.to/gaabrielbrocco/como-configurar-deploy-automatico-da-sua-api-go-postgres-com-docker-via-github-actions-e-ssh-5a61</guid>
      <description>&lt;p&gt;Nesta publicação, vou te mostrar passo a passo como estruturar seu projeto em Go, configurar o deploy automático com GitHub Actions e subir sua API — utilizando Docker. Ao final, você terá tudo rodando em um servidor remoto via SSH e realizando o deploy de forma automatizada.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Arquitetura do projeto
&lt;/h2&gt;

&lt;p&gt;O projeto utilizado como exemplo é um CRUD simples de usuários, bancos e contas bancárias. Porém estruturado utilizando o padrão de arquitetura Ports and Adapters, também conhecido como Arquitetura Hexagonal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.github/
├── workflows/
│   └── deploy.yml
cmd/
├── config/
│   └── database/
│       └── migrations/
└── main.go
internal/
├── core/
│   ├── domain/
│   │   ├── banco.go
│   │   ├── conta.go
│   │   └── usuario.go
│   ├── dto/
│   │   ├── banco.go
│   │   ├── conta.go
│   │   └── usuario.go
│   └── usecase/
│       ├── banco.go
│       ├── conta.go
│       └── usuario.go
├── infra/
│   ├── controller/
│   │   ├── banco.go
│   │   ├── conta.go
│   │   └── usuario.go
│   ├── repository/
│   │   ├── banco.go
│   │   ├── conta.go
│   │   └── usuario.go
│   └── server/
│       └── server.go
pkg/
└── di/
├── banco.go
├── conta.go
└── usuario.go
.env
.env-example
.gitignore
docker-compose.yml
Dockerfile
go.mod
go.sum
README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dockerfile:&lt;/strong&gt; este arquivo cria a imagem Docker da nossa API escrita em Go. Ele utiliza uma imagem base (Golang + Alpine), instala as dependências do projeto, compila o código-fonte em um executável e define os comandos para iniciar a aplicação automaticamente quando o contêiner for iniciado.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./Dockerfile&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; golang:1.24.2-alpine&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; go.mod go.sum ./&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;go mod download

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app/cmd&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;go build &lt;span class="nt"&gt;-o&lt;/span&gt; main .

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["./main"]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;docker-compose.yml:&lt;/strong&gt; Define dois serviços principais que rodam em seus próprios contêineres Docker: api (nossa aplicação Go) e db (o banco de dados PostgreSQL). Ele configura a comunicação entre eles, garante que o banco de dados esteja totalmente operacional antes de iniciar a API, expõe as portas necessárias e gerencia as credenciais e dados de forma eficiente.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./docker-compose.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_HOST=db&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_PORT=5432&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_USER=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_PASSWORD=${DB_PASSWORD}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_NAME=banco&lt;/span&gt;
    &lt;span class="na"&gt;env_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt; 

  &lt;span class="na"&gt;db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:16-alpine&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5432:5432"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_USER=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_PASSWORD=${DB_PASSWORD}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;POSTGRES_DB=banco&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_data:/var/lib/postgresql/data&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD-SHELL"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pg_isready&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-U&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;postgres"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;env_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt; 

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Configurando o GitHub Secrets
&lt;/h2&gt;

&lt;p&gt;No repositório do GitHub, clique em &lt;strong&gt;Settings&lt;/strong&gt; &amp;gt; &lt;strong&gt;Secrets and variables&lt;/strong&gt; &amp;gt; &lt;strong&gt;Actions&lt;/strong&gt; &amp;gt; &lt;strong&gt;New repository secret&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adicione uma SECRET para cada variável abaixo:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ENV&lt;/code&gt; → Variáveis do seu .env que não serão expostas&lt;br&gt;
&lt;code&gt;SSH_HOST&lt;/code&gt; → IP ou domínio do seu servidor&lt;br&gt;
&lt;code&gt;SSH_USER&lt;/code&gt; → Usuário SSH no servidor (ex: ubuntu)&lt;br&gt;
&lt;code&gt;SSH_PRIVATE_KEY&lt;/code&gt; → Conteúdo da chave privada SSH (sem senha)&lt;/p&gt;
&lt;h2&gt;
  
  
  3. Workflow GitHub Actions para deploy automático
&lt;/h2&gt;

&lt;p&gt;O workflow a seguir realiza os seguintes passos:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clona o repositório&lt;/strong&gt; com o código mais recente da branch &lt;code&gt;master&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cria o arquivo &lt;code&gt;.env&lt;/code&gt;&lt;/strong&gt; dinamicamente, utilizando as variáveis sensíveis armazenadas nos &lt;em&gt;Secrets&lt;/em&gt; do GitHub (como senhas e configurações de banco).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constrói a imagem Docker&lt;/strong&gt; da aplicação, passando as variáveis do &lt;code&gt;.env&lt;/code&gt; como argumentos de build (&lt;code&gt;--build-arg&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Salva a imagem como um arquivo &lt;code&gt;.tar&lt;/code&gt;&lt;/strong&gt;, o que permite transportá-la para o servidor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Envia a imagem Docker para o servidor remoto&lt;/strong&gt; via SCP.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acessa o servidor via SSH&lt;/strong&gt;, carrega a imagem Docker, remove o container antigo (caso exista), e &lt;strong&gt;sobe um novo container com a versão atualizada da API&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;deploy.yml:&lt;/strong&gt; Automatiza a entrega contínua da nossa API Go. Ele serve para garantir que cada nova versão da aplicação seja automaticamente compilada, empacotada em um contêiner Docker e implantada em nosso servidor, minimizando erros manuais e agilizando o ciclo de desenvolvimento.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;.github/workflows/deploy.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and Deploy Go API via SSH&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;master&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build-and-deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout do repositório&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Criar o arquivo .env com os valores do secret&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;echo "${{ secrets.ENV }}" &amp;gt; .env&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Construir a imagem Docker&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;export $(grep -v '^#' .env | xargs) &lt;/span&gt;

          &lt;span class="s"&gt;docker build \&lt;/span&gt;
            &lt;span class="s"&gt;--build-arg DB_PASSWORD=$DB_PASSWORD \&lt;/span&gt;
            &lt;span class="s"&gt;--build-arg DB_PORT=$DB_PORT \&lt;/span&gt;
            &lt;span class="s"&gt;--build-arg DB_SSL_MODE=$DB_SSL_MODE \&lt;/span&gt;
            &lt;span class="s"&gt;-t go-api-banco:latest .&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Salvar imagem como .tar&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker save -o go-api-banco.tar go-api-banco:latest&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Enviar imagem para o servidor via SCP&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appleboy/scp-action@master&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SSH_HOST }}&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SSH_USER }}&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SSH_PRIVATE_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;go-api-banco.tar"&lt;/span&gt;
          &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~/"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fazer deploy da imagem via SSH&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appleboy/ssh-action@master&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SSH_HOST }}&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SSH_USER }}&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.SSH_PRIVATE_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;docker load -i ~/go-api-banco.tar&lt;/span&gt;
            &lt;span class="s"&gt;docker stop go-api-banco || true&lt;/span&gt;
            &lt;span class="s"&gt;docker rm go-api-banco || true&lt;/span&gt;
            &lt;span class="s"&gt;docker run -d --name go-api-banco -p 8080:8080 go-api-banco:latest&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Configurando o servidor remoto
&lt;/h2&gt;

&lt;p&gt;No seu servidor (exemplo Ubuntu):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instale o Docker&lt;/li&gt;
&lt;li&gt;Garanta que seu usuário SSH tenha permissão para rodar Docker (geralmente adicionado ao grupo docker)&lt;/li&gt;
&lt;li&gt;Configure sua chave pública SSH no ~/.ssh/authorized_keys do usuário&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Testando o Deploy
&lt;/h2&gt;

&lt;p&gt;Após fazer o push para o branch &lt;code&gt;master&lt;/code&gt;, o GitHub Actions irá iniciar o processo de deploy.&lt;/p&gt;

&lt;p&gt;Para verificar se tudo ocorreu como esperado:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Acesse o GitHub → seu repositório → &lt;strong&gt;Actions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Veja se o workflow &lt;strong&gt;“Build and Deploy Go API via SSH”&lt;/strong&gt; foi executado com sucesso&lt;/li&gt;
&lt;li&gt;Acesse seu servidor no navegador: &lt;code&gt;http://&amp;lt;IP_DO_SERVIDOR&amp;gt;:8080&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sua API Go estará rodando 🎉&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  6. O que você ganha com essa automação?
&lt;/h2&gt;

&lt;p&gt;Ao automatizar o processo de deploy, você transforma algo que poderia levar vários minutos (ou até horas) em um fluxo confiável de poucos cliques ou um simples git push.&lt;/p&gt;

&lt;p&gt;Essa integração entre GitHub Actions, Docker e seu servidor remoto &lt;strong&gt;garante&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Segurança:&lt;/strong&gt; nenhuma senha ou dado sensível é exposto — tudo é gerenciado via GitHub Secrets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistência:&lt;/strong&gt; o mesmo ambiente é reproduzido em cada deploy, reduzindo erros por diferenças de ambiente.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Velocidade:&lt;/strong&gt; o deploy leva apenas alguns minutos e requer zero intervenção manual.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rastreabilidade:&lt;/strong&gt; cada execução fica registrada nos Actions do GitHub, permitindo rastrear mudanças e reverter se necessário.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusão
&lt;/h2&gt;

&lt;p&gt;Com isso, você tem uma pipeline de deploy automatizada, segura e baseada em tecnologias sólidas como Go, Docker, PostgreSQL e GitHub Actions.&lt;/p&gt;

&lt;p&gt;Esse processo elimina deploys manuais, acelera o desenvolvimento e garante que sua API esteja sempre atualizada no servidor.&lt;/p&gt;

</description>
      <category>go</category>
      <category>docker</category>
      <category>githubactions</category>
      <category>ssh</category>
    </item>
  </channel>
</rss>
