DEV Community: Cleber Lucas

How to Build a Collaborative Mindset with AI (and How It Helped Me Build Quita)

Cleber Lucas — Tue, 07 Jul 2026 19:29:33 +0000

While taking Google's AI Fundamentals course, I ran into a topic that sounds simple but completely changes how you work with AI day to day: collaborative mindset.

Treating AI as a development partner means handing it context, direction, and technical feedback in order to get something useful back, the same way a junior teammate would need clear guidance to deliver good work.

I understood this in practice while building Quita, my tool that helps people interpret credit reports from Brazil's Central Bank (Registrato) and automatically generate formal complaints on Consumidor.gov.br, on top of real architectural and backend decisions. I want to share what I learned, with a stronger focus on how to specify requirements better and how I used collaboration with AI as part of my own technical learning.

The problem with talking to AI the wrong way

A lot of people treat AI like an oracle: they ask little, expect a lot, and get frustrated when the answer comes back generic, incomplete, or flat-out wrong. The issue usually isn't the tool, it's how the conversation was framed.

When I started building Quita, I noticed that the vaguer my prompts were, the vaguer the answers I got back. The turning point was realizing that collaborating with AI is a two-way process: I need to bring technical clarity, and only then can the AI bring precision.

A clear picture of what the system needs to do

Before asking for a single line of code, I needed to be able to describe Quita's data flow in technical terms, not in pitch-deck terms. The system receives a payload from a Registrato query, needs to normalize that data into a consistent internal format, identify inconsistencies between values reported by different institutions, and generate a structured text that follows the schema accepted by Consumidor.gov.br.

When I describe the flow this way, with clear input, transformation, and output, the AI stops filling gaps with assumptions and starts proposing solutions within the system's real constraints. A vague specification produces generic code. A technical specification, with an explicit data flow, produces a solution aligned with the architecture that already exists.

A defined audience is also a technical requirement

Quita is used by people with no familiarity with financial or legal jargon. That sounds like a UX detail, but in practice it becomes a technical requirement: API error messages, the text generated by AI inside the complaints, and even the structure of the JSON responses the frontend consumes all need to carry that consideration.

When I state this explicitly in prompts (for example, when asking for the structure of an endpoint that returns a credit analysis result), the suggestions for field naming, validation messages, and response format shift accordingly. A well-defined audience acts as a filter that narrows the AI's interpretation space at every layer of the system, from the database up to the interface.

A well-scoped problem is what cuts hallucination the most

This is the core point I already suspected before the course confirmed it: the more scoped the problem, the less room the AI has to invent a solution.

Asking "help me improve my credit system" is too vague. The AI will fill the gaps with assumptions, and assumptions produce hallucination. Asking "given the JSON returned by the Registrato query, with the fields already defined in my DTO, the system needs to identify inconsistencies between values reported by different institutions and generate a complaint text following the structure accepted by Consumidor.gov.br" is a problem with clear edges: known input, explicit comparison rule, defined output format. The AI has no reason to invent anything, because the response space is already shaped by the specification.

Knowing the basics of your stack isn't optional

This is something I only learned through practice: collaborating well with AI requires that I understand, at least at a basic level, every piece of what I'm building. Without that, there's no way to judge whether the answer I got is correct, or to ask for something more specific in the next round.

Quita's architecture today looks like this: a Java 21 backend with Spring Boot and Spring Security for JWT-based authentication, PostgreSQL with Flyway managing migrations, all running in production on Railway. The frontend is Next.js with TypeScript and Tailwind, deployed on Vercel, consuming the API through authenticated calls. The AI layer (Gemini and OpenAI) sits as an internal backend service, receiving already-normalized credit report data and returning the structured complaint text.

Understanding this architecture, even at the level of "what talks to what," is what gave me the autonomy to catch the AI's mistakes instead of accepting them. A few concrete examples:

When configuring CORS between the frontend on Vercel and the backend on Railway, I got a suggestion that allowed origin * alongside credentials: true, a combination the browser itself rejects. I only caught it because I already knew how CORS handles that combination.
In a suggested Flyway migration, the AI proposed altering a column without accounting for data already present in the production table, which would have broken the deploy. Understanding how Flyway versions and applies migrations is what made me review it before running it.
In a Gemini API integration, I got a code example using a parameter from an outdated SDK version. Without knowing the basics of how that call should be structured, I would have copied and pasted it without noticing it wouldn't even compile.

None of these cases required deep expertise. They required just enough to be suspicious, test, and ask again. That "just enough" is exactly what a collaborative mindset assumes: you don't need to know everything, but you need to know enough to not accept an answer just because it sounds coherent.

This also changes the quality of the question I ask. Understanding the basics of authentication leads me to ask "how do I validate this JWT in the Next.js middleware without breaking server-side rendering," instead of simply "how do I build login." A more specific question produces a more specific answer, and a more specific answer leaves less room for hallucination.

Business context and external rules also guard against hallucination

It's not just code. Quita operates under rules that come from the outside: the Bacen data format, the structural requirements of Consumidor.gov.br. When I bring those rules explicitly into the conversation with the AI, not just the technical requirement, the chance of getting a solution that's technically correct but useless in practice drops significantly.

Using AI collaboration to learn while building

This was the most valuable takeaway from the whole process: a collaborative mindset can be used deliberately to learn, not just to ship faster.

There's a real technical difference between asking "implement JWT token validation" and asking "explain how JWT validation works in this middleware, then implement it." The first approach hands you a ready-to-paste solution. The second hands you the reasoning behind it, which lets me maintain, debug, and extend that code on my own afterward, without depending on going back to the AI for every adjustment.

That's how I picked up things while building Quita that I ended up applying directly to the architecture:

Why Flyway versions migrations sequentially and treats them as immutable, and why altering a migration already applied in production is considered bad practice, which changed how I plan schema changes.
Why a short-lived JWT access token is paired with a longer-lived refresh token, and how that balances security with user experience without forcing constant logins.
Why browsers enforce the same-origin policy and exactly when CORS kicks in, which finally resolved the recurring errors in the Vercel-to-Railway integration.
How to structure prompts for the Gemini API to get consistent JSON output, which required understanding a bit about how the model handles structured formatting versus free text.

Each of these started as a practical Quita problem and turned into knowledge I now carry into the next technical decision, even outside the project. It creates a loop: the more I understand, the better I frame the next question; the better the question, the more precise the answer; and the more precise the answer, the more I learn from it.

The risk in this process is learning shallowly, accepting code without understanding why it works. The way I guard against that is simple: before accepting any solution beyond the trivial, I ask the AI to explain the "why" behind the chosen approach, not just the "how" to implement it. If the explanation doesn't make sense to me, that's a sign I need to understand the concept better before putting that code into production.

Iteration is part of the process, not a sign of failure

Quita's design system, the "Da névoa à rota" (From the fog to the route) concept, built around glassmorphism and forest green as the primary color, didn't come out finished on the first try. It went through several rounds of adjustment with AI, each one refining the visual identity a little further. Understanding that the first answer is just a draft, not the final delivery, took the pressure off "getting it right on the first try" and actually improved the end result.

You're still the one in control

Collaborating doesn't mean delegating blindly. Every code suggestion, every generated text, every design decision the AI proposed for Quita went through my own review before reaching production. AI speeds up the process, but the responsibility for the final result still belongs to the developer.

Collaborative mindset as a working method

In the end, what I learned building Quita is that collaborating with AI looks a lot like guiding a talented junior teammate who doesn't know your context: you need to bring a clear specification, a defined audience, a scoped problem, real architecture, and at least a baseline of technical knowledge to recognize when something's off. In return, the payoff isn't just delivery speed, it's technical learning accumulated with every round of conversation.

Quita is in production today because each of these pieces, clear technical specification, defined audience, scoped problem, explicit architecture, human review, and continuous learning, worked together. Not because I asked the AI to "build a credit analysis system" and got a finished solution back.

Como ter mentalidade colaborativa com a IA (e como isso me ajudou a construir o Quita)

Cleber Lucas — Tue, 07 Jul 2026 19:28:13 +0000

Durante o curso de Fundamentos de IA do Google, cheguei num tema que parece simples, mas muda completamente a forma como você trabalha com inteligência artificial no dia a dia: mentalidade colaborativa.

Tratar a IA como parceira de desenvolvimento significa entregar contexto, direção e feedback técnico pra receber algo útil de volta, do jeito que um colega júnior de time precisaria de orientação clara pra entregar um trabalho bom.

Foi construindo o Quita, minha ferramenta que ajuda pessoas a interpretarem relatórios de crédito do Banco Central (Registrato) e gerarem reclamações automáticas no Consumidor.gov.br, que entendi isso na prática, em cima de decisões técnicas reais de arquitetura, backend e integração com IA. Quero compartilhar o que aprendi, com foco maior em como especificar melhor e em como usei a colaboração com a IA como parte do meu próprio aprendizado técnico.

O problema de conversar mal com a IA

Muita gente trata a IA como um oráculo: pergunta pouco, espera muito, e se frustra quando a resposta vem genérica, incompleta ou simplesmente errada. O erro geralmente não está na ferramenta, está na forma como a conversa foi conduzida.

Quando comecei a desenvolver o Quita, percebi que quanto mais eu jogava perguntas soltas pra IA, mais eu recebia respostas soltas de volta. O ponto de virada foi entender que colaborar com IA é um processo de duas mãos: eu preciso trazer clareza técnica, e só assim a IA consegue trazer precisão.

Visão clara do que o sistema precisa fazer

Antes de pedir qualquer linha de código, eu precisava conseguir descrever o fluxo de dados do Quita em termos técnicos, não em termos de pitch. O sistema recebe um payload de consulta do Registrato, precisa normalizar esses dados num formato interno consistente, identificar inconsistências entre os valores reportados por diferentes instituições, e gerar um texto estruturado que siga o schema aceito pelo Consumidor.gov.br.

Quando descrevo esse fluxo assim, com entrada, transformação e saída bem definidas, a IA para de preencher lacunas com suposições e começa a propor soluções dentro dos limites reais do sistema. Especificação vaga gera código genérico. Especificação técnica, com fluxo de dados explícito, gera solução alinhada com a arquitetura que já existe.

Público definido também é um requisito técnico

O Quita é usado por pessoas que não têm familiaridade com jargão financeiro ou jurídico. Isso parece um detalhe de UX, mas na prática vira requisito técnico: as mensagens de erro da API, os textos gerados pela IA nas reclamações, e até a estrutura das respostas JSON que o frontend consome precisam carregar esse cuidado.

Quando informo isso explicitamente pra IA nos prompts (por exemplo, ao pedir a estrutura de um endpoint que retorna o resultado da análise de crédito), as sugestões de nomenclatura de campos, mensagens de validação e formato de resposta mudam. Público bem definido funciona como um filtro que reduz o espaço de interpretação da IA em cada camada do sistema, do banco de dados até a interface.

Problema bem delimitado é o que mais reduz alucinação

Esse é o ponto central que eu já intuía antes mesmo do curso confirmar: quanto mais delimitado o problema, menos espaço a IA tem para inventar solução.

Pedir "ajuda a melhorar meu sistema de crédito" é vago demais. A IA vai preencher as lacunas com suposições, e suposições geram alucinação. Pedir "a partir do JSON retornado pela consulta do Registrato, com os campos que já defini no meu DTO, o sistema precisa identificar inconsistências entre valores reportados por diferentes instituições e gerar um texto de reclamação seguindo a estrutura aceita pelo Consumidor.gov.br" é um problema com bordas claras: entrada conhecida, regra de comparação explícita, saída num formato definido. A IA não tem motivo pra inventar nada, porque o espaço de resposta já está desenhado pela especificação.

Saber o básico da stack técnica não é opcional

Esse é um aprendizado que só veio com a prática: colaborar bem com a IA exige que eu também entenda, ainda que no básico, cada peça do que estou construindo. Sem isso, não dá pra avaliar se a resposta que recebi está certa, nem pra pedir algo mais específico na próxima rodada.

A arquitetura do Quita hoje é essa: backend em Java 21 com Spring Boot e Spring Security pra autenticação via JWT, banco PostgreSQL com Flyway controlando as migrations, tudo rodando em produção no Railway. O frontend é Next.js com TypeScript e Tailwind, publicado na Vercel, consumindo a API via chamadas autenticadas. A camada de IA (Gemini e OpenAI) entra como serviço interno do backend, recebendo os dados já normalizados do relatório de crédito e devolvendo o texto estruturado da reclamação.

Entender essa arquitetura, mesmo que no nível de "o que conversa com o quê", foi o que me deu autonomia pra identificar erros da IA em vez de aceitá-los. Alguns exemplos concretos:

Ao configurar CORS entre o frontend na Vercel e o backend no Railway, recebi sugestões de configuração que liberavam origem * junto com credentials: true, uma combinação que o próprio navegador rejeita. Só percebi o problema porque já sabia como o CORS trata essa combinação.
Numa sugestão de migration do Flyway, a IA propôs alterar uma coluna sem considerar dados já existentes na tabela em produção, o que quebraria o deploy. Entender como o Flyway versiona e aplica migrations foi o que me fez revisar antes de rodar.
Em uma integração com a API do Gemini, recebi um exemplo de código usando um parâmetro de um SDK desatualizado. Sem conhecer o básico de como a chamada deveria se estruturar, eu teria copiado e colado sem perceber que aquilo não compilaria.

Nenhum desses casos exigiu expertise profunda. Exigiu o suficiente pra desconfiar, testar e perguntar de novo. É esse "suficiente" que a mentalidade colaborativa pressupõe: você não precisa saber tudo, mas precisa saber o bastante pra não aceitar qualquer resposta só porque ela parece coerente.

Isso também muda a qualidade da pergunta que eu faço. Entender o básico de autenticação me leva a perguntar "como validar esse token JWT no middleware do Next.js sem quebrar o server-side rendering", em vez de simplesmente "como fazer login". Pergunta mais específica gera resposta mais específica, e resposta mais específica tem menos espaço pra alucinação.

Contexto de negócio e regras externas também blindam contra alucinação

Não é só código. O Quita opera sobre regras que vêm de fora: o formato de dados do Bacen, as exigências de estrutura do Consumidor.gov.br. Quando trago essas regras explicitamente pra conversa com a IA, e não só o requisito técnico, a chance de receber uma solução tecnicamente correta, mas inútil na prática, cai bastante.

Colaborar com a IA como forma de aprender enquanto desenvolve

Esse foi o aprendizado mais valioso do processo todo: a mentalidade colaborativa pode ser usada deliberadamente para aprender, não só pra produzir mais rápido.

Existe uma diferença técnica relevante entre pedir "implemente a validação do token JWT" e pedir "explique como funciona a validação de um JWT nesse middleware, e só depois implemente". A primeira forma entrega uma solução pronta pra colar. A segunda entrega o raciocínio por trás dela, o que me permite manter, depurar e estender aquele código sozinho depois, sem depender de voltar pra IA a cada ajuste.

Foi assim que aprendi, construindo o Quita, coisas que fui aplicando direto na arquitetura:

Por que o Flyway versiona migrations de forma sequencial e imutável, e por que alterar uma migration já aplicada em produção é considerado prática ruim, o que mudou como eu planejo alterações de schema.
Por que um access token JWT de vida curta é combinado com um refresh token de vida mais longa, e como isso equilibra segurança e experiência do usuário sem exigir login constante.
Por que o navegador aplica a política de mesma origem e quando exatamente o CORS entra em ação, o que resolveu de vez os erros recorrentes na integração entre a Vercel e o Railway.
Como estruturar prompts pra API do Gemini de forma a obter uma saída em JSON consistente, o que exigiu entender um pouco de como o modelo lida com formatação estruturada versus texto livre.

Cada um desses pontos começou como um problema prático do Quita e virou conhecimento que eu carrego pra próxima decisão técnica, mesmo fora do projeto. Isso cria um ciclo: quanto mais eu entendo, melhor formulo a próxima pergunta; quanto melhor a pergunta, mais precisa a resposta; e quanto mais precisa a resposta, mais eu aprendo com ela.

O risco desse processo é aprender de forma rasa, aceitando o código sem entender por que ele funciona. A forma como me protejo disso é simples: antes de aceitar qualquer solução mais complexa, peço pra IA explicar o "porquê" da abordagem escolhida, não só o "como" implementar. Se a explicação não faz sentido pra mim, é sinal de que preciso entender melhor o conceito antes de colocar aquele código em produção.

Iteração faz parte do processo, não é sinal de erro

O design system do Quita, o conceito "Da névoa à rota", com glassmorphism e verde floresta como cor primária, não nasceu pronto. Passou por várias rodadas de ajuste com a IA, cada uma refinando um pouco mais a identidade visual. Entender que a primeira resposta é só um rascunho, não a entrega final, tirou a pressão de "acertar de primeira" e trouxe mais qualidade pro resultado.

Você continua no controle

Colaborar não é delegar cegamente. Cada sugestão de código, cada texto gerado, cada decisão de design que a IA propôs pro Quita passou por revisão minha antes de ir pra produção. A IA acelera o processo, mas a responsabilidade pelo resultado final continua sendo do desenvolvedor.

Mentalidade colaborativa como método de trabalho

No fim, o que aprendi construindo o Quita é que colaborar com IA se parece com orientar um colega júnior talentoso, mas que não conhece o seu contexto: é preciso trazer especificação clara, público, problema delimitado, arquitetura real e um mínimo de conhecimento técnico pra reconhecer quando algo sai fora do esperado. Em troca, o ganho não é só velocidade de entrega, é aprendizado técnico acumulado a cada rodada de conversa.

O Quita está em produção hoje porque cada uma dessas peças, especificação técnica clara, público definido, problema delimitado, arquitetura explícita, revisão humana e aprendizado contínuo, trabalhou junto. Não porque pedi pra IA "criar um sistema de análise de crédito" e recebi de volta uma solução pronta.

Quita: How a Personal Problem Became a Real Product — and What I Learned Along the Way

Cleber Lucas — Mon, 22 Jun 2026 15:58:42 +0000

There's a piece of advice you hear constantly in the developer world: "build projects to learn." The problem is that most portfolio projects are born without a real pain behind them. And without real pain, there's no motivation to go deep, to solve the hard problem, to push through when things break.

Quita was born differently. It was born from a genuine need.

The Problem Nobody Solves Simply

I spent a few years carrying debt accumulated during a rougher phase of life. When I finally had the financial stability to deal with it, I went looking for answers — what exactly was outstanding, with whom, and what the law guaranteed me as a consumer.

What I found was a maze.

The information existed — in the Central Bank of Brazil, on the Consumidor.gov.br platform, in consumer protection legislation. But the path to accessing and using it was confusing enough to make anyone give up. Almost everything I found through searches pointed to law firms or paid consultancies. Services that charge to do something that, in theory, any citizen can do on their own.

I started thinking: if I — someone with access to information and some familiarity with technology — had difficulty navigating this, what happens to those who don't?

Brazil has decades of consumer debt history. Millions of people who can't renegotiate their debts not just because of lack of money, but because of lack of clear guidance on what to do, where to start, and what rights they have.

That's when Quita started making sense.

What Quita Is

Quita is a digital assistant built for the indebted citizen.

It guides users from obtaining their financial reports from the Central Bank of Brazil — the Registrato, a document that consolidates all debts registered in the financial system — all the way to generating structured complaints for Consumidor.gov.br, the platform where financial institutions are legally required to respond.

The core flow is:

The user uploads their Registrato PDF
The system automatically extracts the debts listed in the document
Insights are generated about the debt situation — amounts, institutions, current status
Based on this data, Quita uses AI to produce a well-founded regulatory complaint, ready to be submitted to the responsible institution The goal isn't to solve the debt for the user. It's to give them clear information and a concrete instrument to exercise their rights on their own terms.

The Tech Stack

The project was built with a modern, deliberately lean stack focused on productivity and reliability.

Backend

Java 21
Spring Boot 4
Spring Security with JWT authentication
PostgreSQL
Flyway (database migrations)
Maven Frontend
Next.js
React
TypeScript
Tailwind CSS
Framer Motion Artificial Intelligence
Google Gemini (regulatory complaint generation)
OpenAI (fallback layer and experimentation) Infrastructure
Railway (backend and database hosting)
Vercel (frontend hosting — in deployment) Payments
Mercado Pago Other
PDF extraction and processing

- LGPD compliance: the PDF is processed in memory and deleted after extraction

The Approach That Changed Everything: Specification-Driven Development

If there's one technical lesson I want to highlight from this journey, it's not about any particular technology.

It's about process.

Most of Quita was built through SDDs — Software Design Documents. Before writing any line of code, I wrote the intent. I defined what the system should do, why, what the constraints were, the flows, the expected behaviors at the edges.

This habit transformed the quality of what I built.

When you specify before you implement, the questions that surface are different. You start asking about the user, about risks, about what happens when something goes wrong. You find ambiguities before they become bugs.

AI entered this process not as a code generator, but as an analysis partner. I would present a specification and question it together: is this clear? Is there a case I haven't covered? Does this decision make sense given this context?

In many moments, the work was more about thinking than programming.

What's Working in Production Today

The backend is live on Railway. All core features have been validated end-to-end:

User registration and JWT authentication
Registrato PDF upload and processing
Automatic debt extraction
Insight generation
AI-powered complaint generation via Gemini
Complaint export as PDF The frontend is in its final integration phase, with deployment planned on Vercel.

What's Coming Next

The next delivery is the first functional public version — with a complete web interface, full integration with the production backend, and the complete flow accessible to real users.

After that, the roadmap includes:

Guided onboarding for new users
Expanded support for additional document types
Refinement of the complaint generation model

- Exploring monetization viability

What This Project Taught Me

Fictional projects teach syntax. Real projects teach you to think like an engineer.

The difference is in the pressure a real problem creates. When you know there's someone on the other side who can be helped, every decision carries weight. You don't skip steps. You don't accept a solution that only works on the happy path.

And perhaps the most honest lesson from this journey is that AI doesn't replace software engineering. It amplifies it. When you know what you want to build, when you have clarity about the problem, AI accelerates. When you don't, it just generates confusion faster.

Quita isn't finished yet. But it's closer than ever.

And it was built to solve a real problem, with real technology, for real people.

Quita: como um problema pessoal virou um produto real — e o que aprendi no caminho

Cleber Lucas — Mon, 22 Jun 2026 15:49:55 +0000

Tem uma frase que ouço muito no mundo do desenvolvimento: "construa projetos para aprender". O problema é que a maioria dos projetos de portfólio nasce sem uma dor real por trás. E sem dor real, falta motivação para ir fundo, para resolver o problema difícil, para não desistir quando trava.

O Quita nasceu diferente. Nasceu de uma necessidade verdadeira.

O problema que ninguém resolve de forma simples

Passei alguns anos com dívidas acumuladas de uma fase mais turbulenta da vida. Quando finalmente tive condição financeira de resolver isso, fui atrás de entender o que estava em aberto, com quem, e o que a lei me garantia como consumidor.

O que encontrei foi um labirinto.

As informações existiam — no Banco Central, no Consumidor.gov.br, na legislação. Mas o caminho para acessá-las e usá-las era confuso o suficiente para fazer qualquer pessoa desistir. Quase tudo que encontrava nas buscas levava para escritórios de advocacia ou consultorias pagas. Serviços que cobram para fazer algo que, em tese, o próprio cidadão pode fazer sozinho.

Comecei a pensar: se eu, com acesso a informação e alguma familiaridade com tecnologia, tive dificuldade nisso, o que acontece com quem não tem?

O Brasil tem décadas de histórico de superendividamento. Milhões de pessoas que não conseguem renegociar suas dívidas não apenas por falta de dinheiro, mas por falta de orientação clara sobre o que fazer, por onde começar, quais direitos têm.

Foi aí que o Quita começou a fazer sentido.

O que é o Quita

O Quita é um assistente digital voltado para o cidadão endividado.

Ele guia o usuário desde a obtenção dos relatórios financeiros junto ao Banco Central — o Registrato, documento que consolida todas as dívidas registradas no sistema financeiro — até a geração de manifestações estruturadas para o Consumidor.gov.br, plataforma onde as instituições financeiras são legalmente obrigadas a responder.

O fluxo principal é:

O usuário faz upload do PDF do Registrato
O sistema extrai automaticamente as dívidas presentes no documento
São gerados insights sobre o endividamento — valores, instituições, situação
Com base nesses dados, o Quita produz, via IA, uma reclamação regulatória fundamentada, pronta para ser enviada à instituição responsável O objetivo não é resolver a dívida pelo usuário. É dar a ele informação clara e um instrumento concreto para exercer seus direitos por conta própria.

A stack técnica

O projeto foi construído com uma stack moderna e deliberadamente enxuta, com foco em produtividade e confiabilidade.

Backend

Java 21
Spring Boot 4
Spring Security com autenticação JWT
PostgreSQL
Flyway (migrations)
Maven Frontend
Next.js
TypeScript Inteligência Artificial
Google Gemini (geração de reclamações regulatórias)
OpenAI (camada de fallback e experimentação) Infraestrutura
Railway (hospedagem do backend e banco de dados)
Vercel (hospedagem do frontend — em implantação) Outros
Extração e processamento de PDFs

- Conformidade com LGPD: o PDF é processado em memória e removido após a extração

A abordagem que mudou tudo: desenvolvimento guiado por especificação

Se há um aprendizado técnico que quero destacar dessa jornada, não é sobre nenhuma tecnologia em particular.

É sobre processo.

Grande parte do Quita foi construída através de SDDs — Software Design Documents. Antes de escrever qualquer linha de código, eu escrevia a intenção. Definia o que o sistema deveria fazer, por quê, quais eram as restrições, os fluxos, os comportamentos esperados nas bordas.

Esse hábito transformou a qualidade do que eu construí.

Quando você especifica antes de implementar, as perguntas que surgem são diferentes. Você se pergunta sobre o usuário, sobre os riscos, sobre o que acontece quando algo dá errado. Você descobre ambiguidades antes que elas virem bugs.

A IA entrou nesse processo não como geradora de código, mas como parceira de análise. Eu apresentava uma especificação e questionava junto: isso está claro? Existe algum caso que não cobri? Essa decisão faz sentido dado esse contexto?

Em muitos momentos, o trabalho era mais pensar do que programar.

O que está funcionando hoje

O backend está em produção no Railway. Todas as funcionalidades do core foram validadas:

Cadastro e autenticação JWT
Upload e processamento do Registrato
Extração automática de dívidas
Geração de insights
Geração de reclamação via Gemini
Exportação da reclamação em PDF O frontend está em fase final de integração, com deploy previsto no Vercel.

O que vem pela frente

A próxima entrega é a primeira versão pública funcional — com interface web completa, integração com o backend em produção, e o fluxo completo acessível para usuários reais.

Depois disso, o plano inclui:

Onboarding guiado para novos usuários
Expansão dos tipos de documentos suportados
Refinamento do modelo de geração de reclamações

- Estudar a viabilidade de monetização

O que esse projeto me ensinou

Projetos fictícios ensinam sintaxe. Projetos reais ensinam a pensar como engenheiro.

A diferença está na pressão que um problema real cria. Quando você sabe que existe alguém do outro lado que pode ser ajudado, as decisões ganham peso. Você não pula etapas. Você não aceita uma solução que só funciona no caminho feliz.

E talvez o aprendizado mais honesto dessa jornada seja que IA não substitui engenharia de software. Ela amplifica. Quando você sabe o que quer construir, quando tem clareza sobre o problema, a IA acelera. Quando você não tem, ela só gera confusão mais rápido.

O Quita ainda não está pronto. Mas está mais próximo do que nunca.

E foi construído para resolver um problema real, com tecnologia real, para pessoas reais.

En:Building a RAG Agent for SOPs

Cleber Lucas — Tue, 09 Jun 2026 16:53:55 +0000

How I built a RAG agent to eliminate operational interruptions at work

Open source project using Python, LangChain, ChromaDB, FastAPI and Discord — from a real problem to production deployment.

Every company has a silent cycle that drains time without anyone noticing.

An employee has a question about a procedure. They can't find the answer in the documentation. They interrupt a more experienced colleague. That person stops what they're doing, answers, and goes back to work — focus already broken. Multiply that by 10, 20, 50 times a week.

Watching that pattern is what led me to build POPS AI: a RAG (Retrieval-Augmented Generation) agent capable of answering questions about a company's Standard Operating Procedures, directly through Discord or via a REST API.

The problem that motivated the project

The company had dozens of SOPs documented in PDF format. The problem wasn't a lack of documentation — it was the friction in accessing it. Nobody opens a network folder, hunts for the right file, and reads 15 pages just to answer a quick question.

The question I asked myself was simple: what if the documentation could answer questions on its own?

The architecture in three stages

The system works in three distinct phases, each with a clear responsibility.

1. Extraction

The extrair_texto.py script reads PDFs from the pops_originais/ folder, extracts the full text using PyMuPDF, and saves it as .txt. Page images are also extracted for future use.

import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    full_text = ""
    for page in doc:
        full_text += page.get_text()
    return full_text

Simple, but important: extraction quality determines response quality. Scanned PDFs without OCR are enemy number one here.

2. Embedding generation

With the extracted texts, gerar_embeddings.py splits the content into chunks using LangChain's RecursiveCharacterTextSplitter, generates the vectors, and persists them in ChromaDB.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = splitter.split_text(text)

The chunk_overlap=200 was a deliberate choice: it ensures context isn't cut off abruptly between chunks, which visibly improved response coherence.

The project supports two embedding models via config.py:

Gemini models/embedding-001 — high quality, requires API key, cost scales with volume
Local SBERT (paraphrase-multilingual-mpnet-base-v2) — runs offline, great for avoiding costs or rate limits

This flexibility was one of the design decisions that added the most value, especially for anyone who wants to experiment with the project at zero cost.

3. Query (RAG)

When a user asks a question, the system:

Converts the question into a vector using the same embedding model
Searches for the most semantically similar chunks in ChromaDB
Builds a prompt with the retrieved excerpts as context
Sends it to Gemini 2.0 Flash to generate the final answer

results = collection.query(
    query_embeddings=[question_embedding],
    n_results=5
)

context = "\n\n".join(results['documents'][0])

prompt = f"""You are an assistant specialized in the company's SOPs.
Use only the information below to answer.

Context:
{context}

Question: {question}
"""

The interfaces: Discord and API

The project exposes the knowledge base in two ways.

Discord bot with slash commands:

/pop <question> — queries the vector database and returns the answer
/addpop <file.txt> — lets admins add new SOPs in real time, without reprocessing the entire base

FastAPI REST API with a POST /ask endpoint, designed for integration with other internal systems:

// Request
{ "question": "How do I configure the scanner on the Samsung printer?" }

// Response
{
  "answer": "To configure the scanner, follow these steps:\n1. Turn on the printer...\n[Source: SOP-ScannerSetup.txt]"
}

The challenge nobody talks about: token costs

Building the RAG was the fun part. The real challenge came after: how do you control costs in production?

A few decisions that made a real difference:

Using SBERT for embeddings instead of the Gemini API brings indexing cost down to zero — the model runs locally. Cost only occurs at response generation, which is where the actual value is.

Limiting n_results=5 in the vector search avoids passing unnecessary context to the model. More context = more tokens = more cost, without necessarily improving the answer.

Gemini 2.0 Flash was chosen intentionally over Pro: for objective questions about procedures, the quality difference is minimal while the cost difference is significant.

Deployment: one container, two processes

One decision that cost me a few hours was running the Discord bot and the FastAPI server in the same Docker container. The solution was Supervisor, which manages both processes in a lightweight, self-recovering way.

# supervisord.conf
[program:api]
command=uvicorn api_bot:app --host 0.0.0.0 --port 8000

[program:discord]
command=python bot_discord.py

autostart=true
autorestart=true

The result is a single, lightweight container that starts both services in parallel and automatically restarts either one if it fails. On an entry-level VPS, this matters a lot.

What I learned that wasn't in the plan

Chunking is an art. Chunk size and overlap affect response quality more than the model itself. I spent more time tuning this than anything else.

Security from day one. The .gitignore had to be configured before the first public commit to ensure no confidential company PDFs ended up in the repository. A mistake here is hard to undo.

The real problem wasn't technical. The most complex part was understanding what kind of questions users would actually ask and how to structure the SOPs so the model could retrieve the right information. Garbage in, garbage out applies twice as hard in RAG.

The project is open source

POPS AI is available on GitHub with a full README, .env.example, configured Docker Compose, and step-by-step setup instructions for both local and container-based deployment.

You can clone it, adapt it to your own knowledge base, and use it with your own documents — whether for SOPs, internal wikis, product manuals, or any PDF-based documentation.

🔗 github.com/obelucca/POPS_AI

Stack

Python 3.10 LangChain ChromaDB FastAPI Discord.py Google Gemini 2.0 Flash SBERT Docker Supervisor PyMuPDF

If you made it this far and are curious about any architectural decision, token cost management in production, or how to adapt this to a different use case — drop a comment. Happy to discuss.

Do PDF ao Discord com RAG: Como construí um agente RAG para eliminar interrupções operacionais na empresa

Cleber Lucas — Tue, 09 Jun 2026 13:51:49 +0000

Como construí um agente RAG para eliminar interrupções operacionais na empresa

Projeto open source com Python, LangChain, ChromaDB, FastAPI e Discord — do problema real ao deploy em produção.

Toda empresa tem aquele ciclo silencioso que drena tempo sem que ninguém perceba.

Um funcionário tem uma dúvida sobre um procedimento. Não encontra a resposta nos documentos. Interrompe alguém mais experiente. Essa pessoa para o que estava fazendo, responde, e volta ao trabalho — já com o raciocínio quebrado. Multiplique isso por 10, 20, 50 vezes por semana.

Foi observando esse padrão que decidi construir o POPS AI: um agente de RAG (Retrieval-Augmented Generation) capaz de responder perguntas sobre os Procedimentos Operacionais Padrão de uma empresa, direto pelo Discord ou via API REST.

O problema que motivou o projeto

A empresa tinha dezenas de POPs documentados em PDF. O problema não era a falta de documentação — era o atrito para acessá-la. Ninguém abre uma pasta de rede, procura o arquivo certo e lê 15 páginas para responder uma dúvida pontual.

A pergunta que me fiz foi simples: e se a documentação pudesse responder sozinha?

A arquitetura em três etapas

O sistema funciona em três fases distintas, cada uma com responsabilidade clara.

1. Extração

O script extrair_texto.py lê os PDFs da pasta pops_originais/, extrai o texto completo com PyMuPDF e salva em .txt. Imagens das páginas também são extraídas para uso futuro.

import fitz  # PyMuPDF

def extrair_texto_pdf(caminho_pdf):
    doc = fitz.open(caminho_pdf)
    texto_completo = ""
    for pagina in doc:
        texto_completo += pagina.get_text()
    return texto_completo

Simples, mas importante: a qualidade da extração determina a qualidade das respostas. PDFs escaneados sem OCR são o inimigo número um aqui.

2. Geração de embeddings

Com os textos extraídos, o gerar_embeddings.py divide o conteúdo em chunks usando o RecursiveCharacterTextSplitter da LangChain, gera os vetores e persiste no ChromaDB.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = splitter.split_text(texto)

O chunk_overlap=200 foi uma decisão deliberada: ele garante que o contexto não seja cortado abruptamente entre um chunk e o próximo, o que melhorou visivelmente a coerência das respostas.

O projeto suporta dois modelos de embedding via config.py:

Gemini models/embedding-001 — qualidade alta, requer API key e gera custo por volume
SBERT local (paraphrase-multilingual-mpnet-base-v2) — roda offline, ótimo para evitar custos ou limites de requisição

Essa flexibilidade foi uma das decisões de design que mais agregou valor, especialmente para quem quer experimentar o projeto sem gastar nada.

3. Consulta (RAG)

Quando o usuário faz uma pergunta, o sistema:

Converte a pergunta em vetor usando o mesmo modelo de embedding
Busca os chunks mais semanticamente similares no ChromaDB
Monta um prompt com os trechos recuperados como contexto
Envia para o Gemini 2.0 Flash gerar a resposta final

resultados = collection.query(
    query_embeddings=[embedding_pergunta],
    n_results=5
)

contexto = "\n\n".join(resultados['documents'][0])

prompt = f"""Você é um assistente especializado nos POPs da empresa.
Use apenas as informações abaixo para responder.

Contexto:
{contexto}

Pergunta: {pergunta}
"""

As interfaces: Discord e API

O projeto expõe a base de conhecimento de duas formas.

Bot do Discord com slash commands:

/pop <pergunta> — consulta a base vetorial e retorna a resposta
/addpop <arquivo.txt> — permite que administradores adicionem novos POPs em tempo real, sem precisar reprocessar toda a base

API FastAPI com endpoint POST /ask, pensada para integrar com outros sistemas internos:

// Request
{ "question": "Como configurar o scanner da impressora Samsung?" }

// Response
{
  "answer": "Para configurar o scanner, siga os passos:\n1. Ligue a impressora...\n[Fonte: POP-ConfiguraçãoScanner.txt]"
}

O desafio que ninguém menciona: custo de tokens

Construir o RAG foi a parte divertida. O desafio real veio depois: como controlar o custo em produção?

Algumas decisões que fizeram diferença:

Usar SBERT para embeddings em vez da API do Gemini reduz o custo de indexação para zero — o modelo roda localmente. O custo só existe na geração de resposta, que é onde o valor real está.

Limitar n_results=5 na busca vetorial evita passar contexto desnecessário para o modelo. Mais contexto = mais tokens = mais custo, sem necessariamente melhorar a resposta.

Gemini 2.0 Flash foi escolhido intencionalmente sobre o Pro: para perguntas objetivas sobre procedimentos, a diferença de qualidade é mínima e a diferença de custo é expressiva.

Deploy: um container, dois processos

Uma decisão que me custou algumas horas foi rodar o bot do Discord e a API FastAPI no mesmo container Docker. A solução foi o Supervisor, que gerencia ambos os processos de forma leve e auto-recuperável.

# supervisord.conf
[program:api]
command=uvicorn api_bot:app --host 0.0.0.0 --port 8000

[program:discord]
command=python bot_discord.py

autostart=true
autorestart=true

O resultado é um container único, leve, que sobe os dois serviços em paralelo e reinicia automaticamente qualquer um que falhe. Para uma VPS de entrada, isso faz toda a diferença.

O que aprendi que não estava no plano

Chunking é uma arte. O tamanho e o overlap dos chunks afetam mais a qualidade das respostas do que o modelo em si. Passei mais tempo ajustando isso do que qualquer outra coisa.

Segurança desde o início. O .gitignore precisou ser configurado antes do primeiro commit público para garantir que nenhum PDF com dados confidenciais da empresa fosse parar no repositório. Um erro aqui é difícil de reverter.

O problema real não era técnico. A parte mais complexa foi entender que tipo de pergunta os usuários fariam e como estruturar os POPs para que o modelo conseguisse recuperar as informações certas. Garbage in, garbage out vale dobrado em RAG.

O projeto é open source

O POPS AI está disponível no GitHub com README completo, .env.example, Docker Compose configurado e passo a passo de instalação tanto local quanto via container.

Você pode clonar, adaptar para sua própria base de conhecimento e usar com seus próprios documentos — seja para POPs, wikis internas, manuais de produto ou qualquer documentação em PDF.

🔗 github.com/obelucca/POPS_AI

Stack utilizada

Python 3.10 LangChain ChromaDB FastAPI Discord.py Google Gemini 2.0 Flash SBERT Docker Supervisor PyMuPDF

Se você chegou até aqui e tem curiosidade sobre alguma decisão de arquitetura, custo de tokens em produção ou como adaptar para um caso de uso diferente — deixa nos comentários. Bora trocar ideia.