<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Cleber Lucas</title>
    <description>The latest articles on DEV Community by Cleber Lucas (@obelucca__).</description>
    <link>https://dev.to/obelucca__</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3394098%2F80b0c71f-8c00-476f-a823-7da5129adefe.jpg</url>
      <title>DEV Community: Cleber Lucas</title>
      <link>https://dev.to/obelucca__</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/obelucca__"/>
    <language>en</language>
    <item>
      <title>Quita: How a Personal Problem Became a Real Product — and What I Learned Along the Way</title>
      <dc:creator>Cleber Lucas</dc:creator>
      <pubDate>Mon, 22 Jun 2026 15:58:42 +0000</pubDate>
      <link>https://dev.to/obelucca__/quita-how-a-personal-problem-became-a-real-product-and-what-i-learned-along-the-way-5a3a</link>
      <guid>https://dev.to/obelucca__/quita-how-a-personal-problem-became-a-real-product-and-what-i-learned-along-the-way-5a3a</guid>
      <description>&lt;p&gt;There's a piece of advice you hear constantly in the developer world: "build projects to learn." The problem is that most portfolio projects are born without a real pain behind them. And without real pain, there's no motivation to go deep, to solve the hard problem, to push through when things break.&lt;/p&gt;

&lt;p&gt;Quita was born differently. It was born from a genuine need.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Solves Simply
&lt;/h2&gt;

&lt;p&gt;I spent a few years carrying debt accumulated during a rougher phase of life. When I finally had the financial stability to deal with it, I went looking for answers — what exactly was outstanding, with whom, and what the law guaranteed me as a consumer.&lt;/p&gt;

&lt;p&gt;What I found was a maze.&lt;/p&gt;

&lt;p&gt;The information existed — in the Central Bank of Brazil, on the Consumidor.gov.br platform, in consumer protection legislation. But the path to accessing and using it was confusing enough to make anyone give up. Almost everything I found through searches pointed to law firms or paid consultancies. Services that charge to do something that, in theory, any citizen can do on their own.&lt;/p&gt;

&lt;p&gt;I started thinking: if I — someone with access to information and some familiarity with technology — had difficulty navigating this, what happens to those who don't?&lt;/p&gt;

&lt;p&gt;Brazil has decades of consumer debt history. Millions of people who can't renegotiate their debts not just because of lack of money, but because of lack of clear guidance on what to do, where to start, and what rights they have.&lt;/p&gt;

&lt;p&gt;That's when Quita started making sense.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Quita Is
&lt;/h2&gt;

&lt;p&gt;Quita is a digital assistant built for the indebted citizen.&lt;/p&gt;

&lt;p&gt;It guides users from obtaining their financial reports from the Central Bank of Brazil — the Registrato, a document that consolidates all debts registered in the financial system — all the way to generating structured complaints for Consumidor.gov.br, the platform where financial institutions are legally required to respond.&lt;/p&gt;

&lt;p&gt;The core flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user uploads their Registrato PDF&lt;/li&gt;
&lt;li&gt;The system automatically extracts the debts listed in the document&lt;/li&gt;
&lt;li&gt;Insights are generated about the debt situation — amounts, institutions, current status&lt;/li&gt;
&lt;li&gt;Based on this data, Quita uses AI to produce a well-founded regulatory complaint, ready to be submitted to the responsible institution
The goal isn't to solve the debt for the user. It's to give them clear information and a concrete instrument to exercise their rights on their own terms.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;p&gt;The project was built with a modern, deliberately lean stack focused on productivity and reliability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Java 21&lt;/li&gt;
&lt;li&gt;Spring Boot 4&lt;/li&gt;
&lt;li&gt;Spring Security with JWT authentication&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;Flyway (database migrations)&lt;/li&gt;
&lt;li&gt;Maven
&lt;strong&gt;Frontend&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Next.js&lt;/li&gt;
&lt;li&gt;React&lt;/li&gt;
&lt;li&gt;TypeScript&lt;/li&gt;
&lt;li&gt;Tailwind CSS&lt;/li&gt;
&lt;li&gt;Framer Motion
&lt;strong&gt;Artificial Intelligence&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Google Gemini (regulatory complaint generation)&lt;/li&gt;
&lt;li&gt;OpenAI (fallback layer and experimentation)
&lt;strong&gt;Infrastructure&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Railway (backend and database hosting)&lt;/li&gt;
&lt;li&gt;Vercel (frontend hosting — in deployment)
&lt;strong&gt;Payments&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Mercado Pago
&lt;strong&gt;Other&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;PDF extraction and processing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - LGPD compliance: the PDF is processed in memory and deleted after extraction
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Approach That Changed Everything: Specification-Driven Development
&lt;/h2&gt;

&lt;p&gt;If there's one technical lesson I want to highlight from this journey, it's not about any particular technology.&lt;/p&gt;

&lt;p&gt;It's about process.&lt;/p&gt;

&lt;p&gt;Most of Quita was built through SDDs — Software Design Documents. Before writing any line of code, I wrote the intent. I defined what the system should do, why, what the constraints were, the flows, the expected behaviors at the edges.&lt;/p&gt;

&lt;p&gt;This habit transformed the quality of what I built.&lt;/p&gt;

&lt;p&gt;When you specify before you implement, the questions that surface are different. You start asking about the user, about risks, about what happens when something goes wrong. You find ambiguities before they become bugs.&lt;/p&gt;

&lt;p&gt;AI entered this process not as a code generator, but as an analysis partner. I would present a specification and question it together: is this clear? Is there a case I haven't covered? Does this decision make sense given this context?&lt;/p&gt;

&lt;p&gt;In many moments, the work was more about thinking than programming.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Working in Production Today
&lt;/h2&gt;

&lt;p&gt;The backend is live on Railway. All core features have been validated end-to-end:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User registration and JWT authentication&lt;/li&gt;
&lt;li&gt;Registrato PDF upload and processing&lt;/li&gt;
&lt;li&gt;Automatic debt extraction&lt;/li&gt;
&lt;li&gt;Insight generation&lt;/li&gt;
&lt;li&gt;AI-powered complaint generation via Gemini&lt;/li&gt;
&lt;li&gt;Complaint export as PDF
The frontend is in its final integration phase, with deployment planned on Vercel.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's Coming Next
&lt;/h2&gt;

&lt;p&gt;The next delivery is the first functional public version — with a complete web interface, full integration with the production backend, and the complete flow accessible to real users.&lt;/p&gt;

&lt;p&gt;After that, the roadmap includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guided onboarding for new users&lt;/li&gt;
&lt;li&gt;Expanded support for additional document types&lt;/li&gt;
&lt;li&gt;Refinement of the complaint generation model&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - Exploring monetization viability
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What This Project Taught Me
&lt;/h2&gt;

&lt;p&gt;Fictional projects teach syntax. Real projects teach you to think like an engineer.&lt;/p&gt;

&lt;p&gt;The difference is in the pressure a real problem creates. When you know there's someone on the other side who can be helped, every decision carries weight. You don't skip steps. You don't accept a solution that only works on the happy path.&lt;/p&gt;

&lt;p&gt;And perhaps the most honest lesson from this journey is that AI doesn't replace software engineering. It amplifies it. When you know what you want to build, when you have clarity about the problem, AI accelerates. When you don't, it just generates confusion faster.&lt;/p&gt;

&lt;p&gt;Quita isn't finished yet. But it's closer than ever.&lt;/p&gt;

&lt;p&gt;And it was built to solve a real problem, with real technology, for real people.&lt;/p&gt;

</description>
      <category>buildinpublic</category>
      <category>saas</category>
      <category>java</category>
      <category>ai</category>
    </item>
    <item>
      <title>Quita: como um problema pessoal virou um produto real — e o que aprendi no caminho</title>
      <dc:creator>Cleber Lucas</dc:creator>
      <pubDate>Mon, 22 Jun 2026 15:49:55 +0000</pubDate>
      <link>https://dev.to/obelucca__/quita-como-um-problema-pessoal-virou-um-produto-real-e-o-que-aprendi-no-caminho-2lm6</link>
      <guid>https://dev.to/obelucca__/quita-como-um-problema-pessoal-virou-um-produto-real-e-o-que-aprendi-no-caminho-2lm6</guid>
      <description>&lt;p&gt;Tem uma frase que ouço muito no mundo do desenvolvimento: "construa projetos para aprender". O problema é que a maioria dos projetos de portfólio nasce sem uma dor real por trás. E sem dor real, falta motivação para ir fundo, para resolver o problema difícil, para não desistir quando trava.&lt;/p&gt;

&lt;p&gt;O Quita nasceu diferente. Nasceu de uma necessidade verdadeira.&lt;/p&gt;




&lt;h2&gt;
  
  
  O problema que ninguém resolve de forma simples
&lt;/h2&gt;

&lt;p&gt;Passei alguns anos com dívidas acumuladas de uma fase mais turbulenta da vida. Quando finalmente tive condição financeira de resolver isso, fui atrás de entender o que estava em aberto, com quem, e o que a lei me garantia como consumidor.&lt;/p&gt;

&lt;p&gt;O que encontrei foi um labirinto.&lt;/p&gt;

&lt;p&gt;As informações existiam — no Banco Central, no Consumidor.gov.br, na legislação. Mas o caminho para acessá-las e usá-las era confuso o suficiente para fazer qualquer pessoa desistir. Quase tudo que encontrava nas buscas levava para escritórios de advocacia ou consultorias pagas. Serviços que cobram para fazer algo que, em tese, o próprio cidadão pode fazer sozinho.&lt;/p&gt;

&lt;p&gt;Comecei a pensar: se eu, com acesso a informação e alguma familiaridade com tecnologia, tive dificuldade nisso, o que acontece com quem não tem?&lt;/p&gt;

&lt;p&gt;O Brasil tem décadas de histórico de superendividamento. Milhões de pessoas que não conseguem renegociar suas dívidas não apenas por falta de dinheiro, mas por falta de orientação clara sobre o que fazer, por onde começar, quais direitos têm.&lt;/p&gt;

&lt;p&gt;Foi aí que o Quita começou a fazer sentido.&lt;/p&gt;




&lt;h2&gt;
  
  
  O que é o Quita
&lt;/h2&gt;

&lt;p&gt;O Quita é um assistente digital voltado para o cidadão endividado.&lt;/p&gt;

&lt;p&gt;Ele guia o usuário desde a obtenção dos relatórios financeiros junto ao Banco Central — o Registrato, documento que consolida todas as dívidas registradas no sistema financeiro — até a geração de manifestações estruturadas para o Consumidor.gov.br, plataforma onde as instituições financeiras são legalmente obrigadas a responder.&lt;/p&gt;

&lt;p&gt;O fluxo principal é:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;O usuário faz upload do PDF do Registrato&lt;/li&gt;
&lt;li&gt;O sistema extrai automaticamente as dívidas presentes no documento&lt;/li&gt;
&lt;li&gt;São gerados insights sobre o endividamento — valores, instituições, situação&lt;/li&gt;
&lt;li&gt;Com base nesses dados, o Quita produz, via IA, uma reclamação regulatória fundamentada, pronta para ser enviada à instituição responsável
O objetivo não é resolver a dívida pelo usuário. É dar a ele informação clara e um instrumento concreto para exercer seus direitos por conta própria.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  A stack técnica
&lt;/h2&gt;

&lt;p&gt;O projeto foi construído com uma stack moderna e deliberadamente enxuta, com foco em produtividade e confiabilidade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Java 21&lt;/li&gt;
&lt;li&gt;Spring Boot 4&lt;/li&gt;
&lt;li&gt;Spring Security com autenticação JWT&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;Flyway (migrations)&lt;/li&gt;
&lt;li&gt;Maven
&lt;strong&gt;Frontend&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Next.js&lt;/li&gt;
&lt;li&gt;TypeScript
&lt;strong&gt;Inteligência Artificial&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Google Gemini (geração de reclamações regulatórias)&lt;/li&gt;
&lt;li&gt;OpenAI (camada de fallback e experimentação)
&lt;strong&gt;Infraestrutura&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Railway (hospedagem do backend e banco de dados)&lt;/li&gt;
&lt;li&gt;Vercel (hospedagem do frontend — em implantação)
&lt;strong&gt;Outros&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Extração e processamento de PDFs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - Conformidade com LGPD: o PDF é processado em memória e removido após a extração
&lt;/h2&gt;

&lt;h2&gt;
  
  
  A abordagem que mudou tudo: desenvolvimento guiado por especificação
&lt;/h2&gt;

&lt;p&gt;Se há um aprendizado técnico que quero destacar dessa jornada, não é sobre nenhuma tecnologia em particular.&lt;/p&gt;

&lt;p&gt;É sobre processo.&lt;/p&gt;

&lt;p&gt;Grande parte do Quita foi construída através de SDDs — Software Design Documents. Antes de escrever qualquer linha de código, eu escrevia a intenção. Definia o que o sistema deveria fazer, por quê, quais eram as restrições, os fluxos, os comportamentos esperados nas bordas.&lt;/p&gt;

&lt;p&gt;Esse hábito transformou a qualidade do que eu construí.&lt;/p&gt;

&lt;p&gt;Quando você especifica antes de implementar, as perguntas que surgem são diferentes. Você se pergunta sobre o usuário, sobre os riscos, sobre o que acontece quando algo dá errado. Você descobre ambiguidades antes que elas virem bugs.&lt;/p&gt;

&lt;p&gt;A IA entrou nesse processo não como geradora de código, mas como parceira de análise. Eu apresentava uma especificação e questionava junto: isso está claro? Existe algum caso que não cobri? Essa decisão faz sentido dado esse contexto?&lt;/p&gt;

&lt;p&gt;Em muitos momentos, o trabalho era mais pensar do que programar.&lt;/p&gt;




&lt;h2&gt;
  
  
  O que está funcionando hoje
&lt;/h2&gt;

&lt;p&gt;O backend está em produção no Railway. Todas as funcionalidades do core foram validadas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cadastro e autenticação JWT&lt;/li&gt;
&lt;li&gt;Upload e processamento do Registrato&lt;/li&gt;
&lt;li&gt;Extração automática de dívidas&lt;/li&gt;
&lt;li&gt;Geração de insights&lt;/li&gt;
&lt;li&gt;Geração de reclamação via Gemini&lt;/li&gt;
&lt;li&gt;Exportação da reclamação em PDF
O frontend está em fase final de integração, com deploy previsto no Vercel.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  O que vem pela frente
&lt;/h2&gt;

&lt;p&gt;A próxima entrega é a primeira versão pública funcional — com interface web completa, integração com o backend em produção, e o fluxo completo acessível para usuários reais.&lt;/p&gt;

&lt;p&gt;Depois disso, o plano inclui:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Onboarding guiado para novos usuários&lt;/li&gt;
&lt;li&gt;Expansão dos tipos de documentos suportados&lt;/li&gt;
&lt;li&gt;Refinamento do modelo de geração de reclamações&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - Estudar a viabilidade de monetização
&lt;/h2&gt;

&lt;h2&gt;
  
  
  O que esse projeto me ensinou
&lt;/h2&gt;

&lt;p&gt;Projetos fictícios ensinam sintaxe. Projetos reais ensinam a pensar como engenheiro.&lt;/p&gt;

&lt;p&gt;A diferença está na pressão que um problema real cria. Quando você sabe que existe alguém do outro lado que pode ser ajudado, as decisões ganham peso. Você não pula etapas. Você não aceita uma solução que só funciona no caminho feliz.&lt;/p&gt;

&lt;p&gt;E talvez o aprendizado mais honesto dessa jornada seja que IA não substitui engenharia de software. Ela amplifica. Quando você sabe o que quer construir, quando tem clareza sobre o problema, a IA acelera. Quando você não tem, ela só gera confusão mais rápido.&lt;/p&gt;

&lt;p&gt;O Quita ainda não está pronto. Mas está mais próximo do que nunca.&lt;/p&gt;

&lt;p&gt;E foi construído para resolver um problema real, com tecnologia real, para pessoas reais.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>braziliandevs</category>
      <category>java</category>
      <category>specdriven</category>
    </item>
    <item>
      <title>En:Building a RAG Agent for SOPs</title>
      <dc:creator>Cleber Lucas</dc:creator>
      <pubDate>Tue, 09 Jun 2026 16:53:55 +0000</pubDate>
      <link>https://dev.to/obelucca__/enbuilding-a-rag-agent-for-sops-5hj1</link>
      <guid>https://dev.to/obelucca__/enbuilding-a-rag-agent-for-sops-5hj1</guid>
      <description>&lt;h1&gt;
  
  
  How I built a RAG agent to eliminate operational interruptions at work
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Open source project using Python, LangChain, ChromaDB, FastAPI and Discord — from a real problem to production deployment.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Every company has a silent cycle that drains time without anyone noticing.&lt;/p&gt;

&lt;p&gt;An employee has a question about a procedure. They can't find the answer in the documentation. They interrupt a more experienced colleague. That person stops what they're doing, answers, and goes back to work — focus already broken. Multiply that by 10, 20, 50 times a week.&lt;/p&gt;

&lt;p&gt;Watching that pattern is what led me to build &lt;strong&gt;POPS AI&lt;/strong&gt;: a RAG &lt;em&gt;(Retrieval-Augmented Generation)&lt;/em&gt; agent capable of answering questions about a company's Standard Operating Procedures, directly through Discord or via a REST API.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem that motivated the project
&lt;/h2&gt;

&lt;p&gt;The company had dozens of SOPs documented in PDF format. The problem wasn't a lack of documentation — it was the friction in accessing it. Nobody opens a network folder, hunts for the right file, and reads 15 pages just to answer a quick question.&lt;/p&gt;

&lt;p&gt;The question I asked myself was simple: &lt;strong&gt;what if the documentation could answer questions on its own?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture in three stages
&lt;/h2&gt;

&lt;p&gt;The system works in three distinct phases, each with a clear responsibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Extraction
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;extrair_texto.py&lt;/code&gt; script reads PDFs from the &lt;code&gt;pops_originais/&lt;/code&gt; folder, extracts the full text using PyMuPDF, and saves it as &lt;code&gt;.txt&lt;/code&gt;. Page images are also extracted for future use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;  &lt;span class="c1"&gt;# PyMuPDF
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;full_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;full_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;full_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple, but important: extraction quality determines response quality. Scanned PDFs without OCR are enemy number one here.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Embedding generation
&lt;/h3&gt;

&lt;p&gt;With the extracted texts, &lt;code&gt;gerar_embeddings.py&lt;/code&gt; splits the content into chunks using LangChain's &lt;code&gt;RecursiveCharacterTextSplitter&lt;/code&gt;, generates the vectors, and persists them in ChromaDB.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;chunk_overlap=200&lt;/code&gt; was a deliberate choice: it ensures context isn't cut off abruptly between chunks, which visibly improved response coherence.&lt;/p&gt;

&lt;p&gt;The project supports two embedding models via &lt;code&gt;config.py&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini &lt;code&gt;models/embedding-001&lt;/code&gt;&lt;/strong&gt; — high quality, requires API key, cost scales with volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local SBERT (&lt;code&gt;paraphrase-multilingual-mpnet-base-v2&lt;/code&gt;)&lt;/strong&gt; — runs offline, great for avoiding costs or rate limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility was one of the design decisions that added the most value, especially for anyone who wants to experiment with the project at zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Query (RAG)
&lt;/h3&gt;

&lt;p&gt;When a user asks a question, the system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Converts the question into a vector using the same embedding model&lt;/li&gt;
&lt;li&gt;Searches for the most semantically similar chunks in ChromaDB&lt;/li&gt;
&lt;li&gt;Builds a prompt with the retrieved excerpts as context&lt;/li&gt;
&lt;li&gt;Sends it to Gemini 2.0 Flash to generate the final answer
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;question_embedding&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are an assistant specialized in the company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s SOPs.
Use only the information below to answer.

Context:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The interfaces: Discord and API
&lt;/h2&gt;

&lt;p&gt;The project exposes the knowledge base in two ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discord bot&lt;/strong&gt; with slash commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/pop &amp;lt;question&amp;gt;&lt;/code&gt; — queries the vector database and returns the answer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/addpop &amp;lt;file.txt&amp;gt;&lt;/code&gt; — lets admins add new SOPs in real time, without reprocessing the entire base&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;FastAPI REST API&lt;/strong&gt; with a &lt;code&gt;POST /ask&lt;/code&gt; endpoint, designed for integration with other internal systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Request&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How do I configure the scanner on the Samsung printer?"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"To configure the scanner, follow these steps:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;1. Turn on the printer...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;[Source: SOP-ScannerSetup.txt]"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The challenge nobody talks about: token costs
&lt;/h2&gt;

&lt;p&gt;Building the RAG was the fun part. The real challenge came after: how do you control costs in production?&lt;/p&gt;

&lt;p&gt;A few decisions that made a real difference:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using SBERT for embeddings instead of the Gemini API&lt;/strong&gt; brings indexing cost down to zero — the model runs locally. Cost only occurs at response generation, which is where the actual value is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limiting &lt;code&gt;n_results=5&lt;/code&gt; in the vector search&lt;/strong&gt; avoids passing unnecessary context to the model. More context = more tokens = more cost, without necessarily improving the answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.0 Flash&lt;/strong&gt; was chosen intentionally over Pro: for objective questions about procedures, the quality difference is minimal while the cost difference is significant.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment: one container, two processes
&lt;/h2&gt;

&lt;p&gt;One decision that cost me a few hours was running the Discord bot and the FastAPI server in the same Docker container. The solution was &lt;strong&gt;Supervisor&lt;/strong&gt;, which manages both processes in a lightweight, self-recovering way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# supervisord.conf
&lt;/span&gt;&lt;span class="nn"&gt;[program:api]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;uvicorn api_bot:app --host 0.0.0.0 --port 8000&lt;/span&gt;

&lt;span class="nn"&gt;[program:discord]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;python bot_discord.py&lt;/span&gt;

&lt;span class="py"&gt;autostart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;autorestart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a single, lightweight container that starts both services in parallel and automatically restarts either one if it fails. On an entry-level VPS, this matters a lot.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned that wasn't in the plan
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Chunking is an art.&lt;/strong&gt; Chunk size and overlap affect response quality more than the model itself. I spent more time tuning this than anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security from day one.&lt;/strong&gt; The &lt;code&gt;.gitignore&lt;/code&gt; had to be configured before the first public commit to ensure no confidential company PDFs ended up in the repository. A mistake here is hard to undo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real problem wasn't technical.&lt;/strong&gt; The most complex part was understanding what kind of questions users would actually ask and how to structure the SOPs so the model could retrieve the right information. Garbage in, garbage out applies twice as hard in RAG.&lt;/p&gt;




&lt;h2&gt;
  
  
  The project is open source
&lt;/h2&gt;

&lt;p&gt;POPS AI is available on GitHub with a full README, &lt;code&gt;.env.example&lt;/code&gt;, configured Docker Compose, and step-by-step setup instructions for both local and container-based deployment.&lt;/p&gt;

&lt;p&gt;You can clone it, adapt it to your own knowledge base, and use it with your own documents — whether for SOPs, internal wikis, product manuals, or any PDF-based documentation.&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/obelucca/POPS_AI" rel="noopener noreferrer"&gt;github.com/obelucca/POPS_AI&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Python 3.10&lt;/code&gt; &lt;code&gt;LangChain&lt;/code&gt; &lt;code&gt;ChromaDB&lt;/code&gt; &lt;code&gt;FastAPI&lt;/code&gt; &lt;code&gt;Discord.py&lt;/code&gt; &lt;code&gt;Google Gemini 2.0 Flash&lt;/code&gt; &lt;code&gt;SBERT&lt;/code&gt; &lt;code&gt;Docker&lt;/code&gt; &lt;code&gt;Supervisor&lt;/code&gt; &lt;code&gt;PyMuPDF&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you made it this far and are curious about any architectural decision, token cost management in production, or how to adapt this to a different use case — drop a comment. Happy to discuss.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
    </item>
    <item>
      <title>Do PDF ao Discord com RAG: Como construí um agente RAG para eliminar interrupções operacionais na empresa</title>
      <dc:creator>Cleber Lucas</dc:creator>
      <pubDate>Tue, 09 Jun 2026 13:51:49 +0000</pubDate>
      <link>https://dev.to/obelucca__/do-pdf-ao-discord-com-rag-como-construi-um-agente-rag-para-eliminar-interrupcoes-operacionais-na-48n5</link>
      <guid>https://dev.to/obelucca__/do-pdf-ao-discord-com-rag-como-construi-um-agente-rag-para-eliminar-interrupcoes-operacionais-na-48n5</guid>
      <description>&lt;h1&gt;
  
  
  Como construí um agente RAG para eliminar interrupções operacionais na empresa
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Projeto open source com Python, LangChain, ChromaDB, FastAPI e Discord — do problema real ao deploy em produção.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Toda empresa tem aquele ciclo silencioso que drena tempo sem que ninguém perceba.&lt;/p&gt;

&lt;p&gt;Um funcionário tem uma dúvida sobre um procedimento. Não encontra a resposta nos documentos. Interrompe alguém mais experiente. Essa pessoa para o que estava fazendo, responde, e volta ao trabalho — já com o raciocínio quebrado. Multiplique isso por 10, 20, 50 vezes por semana.&lt;/p&gt;

&lt;p&gt;Foi observando esse padrão que decidi construir o &lt;strong&gt;POPS AI&lt;/strong&gt;: um agente de RAG &lt;em&gt;(Retrieval-Augmented Generation)&lt;/em&gt; capaz de responder perguntas sobre os Procedimentos Operacionais Padrão de uma empresa, direto pelo Discord ou via API REST.&lt;/p&gt;




&lt;h2&gt;
  
  
  O problema que motivou o projeto
&lt;/h2&gt;

&lt;p&gt;A empresa tinha dezenas de POPs documentados em PDF. O problema não era a falta de documentação — era o atrito para acessá-la. Ninguém abre uma pasta de rede, procura o arquivo certo e lê 15 páginas para responder uma dúvida pontual.&lt;/p&gt;

&lt;p&gt;A pergunta que me fiz foi simples: &lt;strong&gt;e se a documentação pudesse responder sozinha?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  A arquitetura em três etapas
&lt;/h2&gt;

&lt;p&gt;O sistema funciona em três fases distintas, cada uma com responsabilidade clara.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Extração
&lt;/h3&gt;

&lt;p&gt;O script &lt;code&gt;extrair_texto.py&lt;/code&gt; lê os PDFs da pasta &lt;code&gt;pops_originais/&lt;/code&gt;, extrai o texto completo com PyMuPDF e salva em &lt;code&gt;.txt&lt;/code&gt;. Imagens das páginas também são extraídas para uso futuro.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;  &lt;span class="c1"&gt;# PyMuPDF
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extrair_texto_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caminho_pdf&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caminho_pdf&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;texto_completo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pagina&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;texto_completo&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;pagina&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;texto_completo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simples, mas importante: a qualidade da extração determina a qualidade das respostas. PDFs escaneados sem OCR são o inimigo número um aqui.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Geração de embeddings
&lt;/h3&gt;

&lt;p&gt;Com os textos extraídos, o &lt;code&gt;gerar_embeddings.py&lt;/code&gt; divide o conteúdo em chunks usando o &lt;code&gt;RecursiveCharacterTextSplitter&lt;/code&gt; da LangChain, gera os vetores e persiste no ChromaDB.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O &lt;code&gt;chunk_overlap=200&lt;/code&gt; foi uma decisão deliberada: ele garante que o contexto não seja cortado abruptamente entre um chunk e o próximo, o que melhorou visivelmente a coerência das respostas.&lt;/p&gt;

&lt;p&gt;O projeto suporta dois modelos de embedding via &lt;code&gt;config.py&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini &lt;code&gt;models/embedding-001&lt;/code&gt;&lt;/strong&gt; — qualidade alta, requer API key e gera custo por volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SBERT local (&lt;code&gt;paraphrase-multilingual-mpnet-base-v2&lt;/code&gt;)&lt;/strong&gt; — roda offline, ótimo para evitar custos ou limites de requisição&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Essa flexibilidade foi uma das decisões de design que mais agregou valor, especialmente para quem quer experimentar o projeto sem gastar nada.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Consulta (RAG)
&lt;/h3&gt;

&lt;p&gt;Quando o usuário faz uma pergunta, o sistema:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Converte a pergunta em vetor usando o mesmo modelo de embedding&lt;/li&gt;
&lt;li&gt;Busca os chunks mais semanticamente similares no ChromaDB&lt;/li&gt;
&lt;li&gt;Monta um prompt com os trechos recuperados como contexto&lt;/li&gt;
&lt;li&gt;Envia para o Gemini 2.0 Flash gerar a resposta final
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;resultados&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;embedding_pergunta&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;contexto&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Você é um assistente especializado nos POPs da empresa.
Use apenas as informações abaixo para responder.

Contexto:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;contexto&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Pergunta: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pergunta&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  As interfaces: Discord e API
&lt;/h2&gt;

&lt;p&gt;O projeto expõe a base de conhecimento de duas formas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bot do Discord&lt;/strong&gt; com slash commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/pop &amp;lt;pergunta&amp;gt;&lt;/code&gt; — consulta a base vetorial e retorna a resposta&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/addpop &amp;lt;arquivo.txt&amp;gt;&lt;/code&gt; — permite que administradores adicionem novos POPs em tempo real, sem precisar reprocessar toda a base&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API FastAPI&lt;/strong&gt; com endpoint &lt;code&gt;POST /ask&lt;/code&gt;, pensada para integrar com outros sistemas internos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Request&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"question"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Como configurar o scanner da impressora Samsung?"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Para configurar o scanner, siga os passos:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;1. Ligue a impressora...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;[Fonte: POP-ConfiguraçãoScanner.txt]"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  O desafio que ninguém menciona: custo de tokens
&lt;/h2&gt;

&lt;p&gt;Construir o RAG foi a parte divertida. O desafio real veio depois: como controlar o custo em produção?&lt;/p&gt;

&lt;p&gt;Algumas decisões que fizeram diferença:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usar SBERT para embeddings em vez da API do Gemini&lt;/strong&gt; reduz o custo de indexação para zero — o modelo roda localmente. O custo só existe na geração de resposta, que é onde o valor real está.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitar &lt;code&gt;n_results=5&lt;/code&gt; na busca vetorial&lt;/strong&gt; evita passar contexto desnecessário para o modelo. Mais contexto = mais tokens = mais custo, sem necessariamente melhorar a resposta.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 2.0 Flash&lt;/strong&gt; foi escolhido intencionalmente sobre o Pro: para perguntas objetivas sobre procedimentos, a diferença de qualidade é mínima e a diferença de custo é expressiva.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploy: um container, dois processos
&lt;/h2&gt;

&lt;p&gt;Uma decisão que me custou algumas horas foi rodar o bot do Discord e a API FastAPI no mesmo container Docker. A solução foi o &lt;strong&gt;Supervisor&lt;/strong&gt;, que gerencia ambos os processos de forma leve e auto-recuperável.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# supervisord.conf
&lt;/span&gt;&lt;span class="nn"&gt;[program:api]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;uvicorn api_bot:app --host 0.0.0.0 --port 8000&lt;/span&gt;

&lt;span class="nn"&gt;[program:discord]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;python bot_discord.py&lt;/span&gt;

&lt;span class="py"&gt;autostart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;autorestart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O resultado é um container único, leve, que sobe os dois serviços em paralelo e reinicia automaticamente qualquer um que falhe. Para uma VPS de entrada, isso faz toda a diferença.&lt;/p&gt;




&lt;h2&gt;
  
  
  O que aprendi que não estava no plano
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Chunking é uma arte.&lt;/strong&gt; O tamanho e o overlap dos chunks afetam mais a qualidade das respostas do que o modelo em si. Passei mais tempo ajustando isso do que qualquer outra coisa.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Segurança desde o início.&lt;/strong&gt; O &lt;code&gt;.gitignore&lt;/code&gt; precisou ser configurado antes do primeiro commit público para garantir que nenhum PDF com dados confidenciais da empresa fosse parar no repositório. Um erro aqui é difícil de reverter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;O problema real não era técnico.&lt;/strong&gt; A parte mais complexa foi entender que tipo de pergunta os usuários fariam e como estruturar os POPs para que o modelo conseguisse recuperar as informações certas. Garbage in, garbage out vale dobrado em RAG.&lt;/p&gt;




&lt;h2&gt;
  
  
  O projeto é open source
&lt;/h2&gt;

&lt;p&gt;O POPS AI está disponível no GitHub com README completo, &lt;code&gt;.env.example&lt;/code&gt;, Docker Compose configurado e passo a passo de instalação tanto local quanto via container.&lt;/p&gt;

&lt;p&gt;Você pode clonar, adaptar para sua própria base de conhecimento e usar com seus próprios documentos — seja para POPs, wikis internas, manuais de produto ou qualquer documentação em PDF.&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/obelucca/POPS_AI" rel="noopener noreferrer"&gt;github.com/obelucca/POPS_AI&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Stack utilizada
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Python 3.10&lt;/code&gt; &lt;code&gt;LangChain&lt;/code&gt; &lt;code&gt;ChromaDB&lt;/code&gt; &lt;code&gt;FastAPI&lt;/code&gt; &lt;code&gt;Discord.py&lt;/code&gt; &lt;code&gt;Google Gemini 2.0 Flash&lt;/code&gt; &lt;code&gt;SBERT&lt;/code&gt; &lt;code&gt;Docker&lt;/code&gt; &lt;code&gt;Supervisor&lt;/code&gt; &lt;code&gt;PyMuPDF&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Se você chegou até aqui e tem curiosidade sobre alguma decisão de arquitetura, custo de tokens em produção ou como adaptar para um caso de uso diferente — deixa nos comentários. Bora trocar ideia.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
    </item>
  </channel>
</rss>
