Quentin Merle

Posted on Jun 8 • Edited on Jun 9

Building a Local-First Autonomous Agent from Scratch (LangGraph & Ollama)

#discuss #llm #ai #python

Everyone told me AI was going to write my code for me. So I asked an AI to help me code an AI Agent. One month later, between intense coding phases and deep reflection, I had my answer — and it wasn't the one I expected.

This project was born out of a deep need: self-education. I wanted to understand how it actually works behind the scenes. So this isn't the story of how I automated my job with a script. It's the story of what happens when you decide to lift the hood on the AI hype, reject the "vibe coding" approach, and try to build a robust local AI agent from scratch.

What you're about to read is a raw and honest retrospective of a month of asymmetrical pair-programming with an AI to build another AI.

What Exactly Are We Talking About? (The Project)

To set the context, Vibrisse Agent isn't just a simple chat or another API wrapper in a terminal. It's an autonomous agent (Python / LangGraph) designed with a "local-first" hybrid architecture: it runs primarily on your machine (via Ollama or vLLM — side note: for Mac users, oMLX is fire! 🔥), but can dynamically delegate certain tasks to the Cloud (Groq, OpenRouter) depending on complexity.

The specifications were ambitious:

MCP (Model Context Protocol) integration to connect it to real tools from the open-source ecosystem — GitHub to navigate repositories and PRs, SQLite to query local databases, Context7 to access up-to-date documentation, and Fetch to interact with the web.
Multimodal vision (with Gemma 4) to analyze the UI live.
An onboarding Wizard coupled with a dynamic prompting system.

And above all, Ghost Mode: the ability to drive the agent in the background directly from source code comments (// @vibrisse: refactor this loop), so you never have to switch windows again.

It's precisely this level of requirement — wanting to build a real "product" and not just a demo — that shattered my initial assumptions.

The Myth of "Vibe Coding"

There's this persistent idea right now that all it takes is prompting to get a complex application. This is what we call "vibe coding." You write a prompt, the AI spits out code, you click "run", and boom — you have a SaaS.

The truth? That's totally true for a simple CRUD application. But as soon as you start building a system that requires strict context management, deterministic tool execution, and state persistence... the vibe dies very quickly.

The main problem I faced was context management (that famous "Lost in the Middle"). It's very easy to let yourself go and chain questions that pop into your head with the AI. It's natural and exhilarating, but it creates a huge amount of "noise" in the conversation. Without guardrails, you end up with massive context loss: the model forgets what was decided two hours earlier, the session drifts, and the code breaks.

The solution wasn't a magical new model; it was a huge amount of discipline and pure software engineering: strict session files (ROADMAP.md), constant notes, and explicit architectural tracking.

Why Build Rather Than Use?

You might be wondering: Cursor, Copilot, and now Claude Code exist. Why reinvent the wheel?

The honest answer: to stop being blind to the underlying mechanics. The real benefit of building it yourself is that when something breaks (and it breaks often), you know exactly why and how to fix it.

On one strict condition: understanding every line of generated code, the patterns, and the logic. Without this perspective to challenge the AI's proposals, you quickly fall into what I call "hell loops": the AI goes in circles trying to fix its own context errors, and the human eventually stops understanding what's going on.

The admission no one makes:
Without AI, this project wouldn't exist in this form. I had neither the time nor the deep foundations in Python to go this fast. Collaborating with an AI (Gemini, in my case) allowed me to focus entirely on vision and architecture rather than the technical friction of learning a new language from scratch.

But here's the trap: an LLM is excellent at writing isolated functions, but it's catastrophic at designing and maintaining a global architecture. Without my 15 years of web development experience, the project would have ended up as a 3000-line spaghetti main.py file, completely unmaintainable.

Between each assisted development phase, I had to impose drastic "clean" and refactoring phases (separation of concerns, solid principles) to keep the project state of the art and readable for a human. I often had to get my hands dirty to rewrite what the AI had hastily "patched".

Knowing when to challenge an answer, when to sense that a direction is fundamentally wrong, and when to reject a solution that "works" but will break in three days — that doesn't come from a prompt. That comes from experience.

Today, a vast majority of developers use AI (around 76% according to Stack Overflow). Yet, there are two lies still circulating:

"AI does everything, you don't need to know anything."
"Real developers don't need AI."

The reality is that experience made the collaboration productive, and the collaboration made the experience applicable to a new domain. It's not magic, it's smart engineering.

Asymmetrical Pair-Programming: What They Don't Tell You

When you pair-program with an AI, the dynamic is profoundly asymmetrical.

The AI brings brute force: it can read files instantly, generate boilerplate in seconds, and dig through documentation without ever getting tired.
You, the developer, bring the architectural veto right and the business vision.

One essential thing to understand: Cloud AI is accommodating by nature. It's often "over-motivated" by what you propose to it. Sometimes, when I was heading straight for a technical wall, I had to step out of my pure developer posture to discuss with it. I had to give it a strict role ("You are a seasoned AI Engineer...") and challenge it on its approach. And suddenly, an "It's not possible" transformed into a concrete and relevant analysis of alternatives.

The discipline I had to learn: establish "thinking out loud" sessions. Before each step, ask the AI to summarize what was done, what we're going to do, and why. Discuss the impacts. Step back from pure code to stay focused on the vision and feed the AI with my thoughts.

The "Human-in-the-Loop" and Interactive Artifacts

One of the biggest revelations was realizing that an autonomous agent shouldn't do everything alone. For complex tasks (like rebuilding an architecture), I had to design an "Architect" mode.

Instead of spitting out 500 lines of code at once, the agent generates a detailed plan wrapped in an "Artifact". The interface intercepts it, pauses execution, and shows me a clean interactive render with approval buttons.

That's where the magic happens: before the agent uses its tools to modify my files, I can review its plan. This veto right integrated into the core of the system changes everything: you move from a "black box" AI that unpredictably breaks your project, to a real colleague submitting their drafts.

The Double Learning Curve (The Part No One Anticipates)

The most unexpected insight from this journey is that learning to build AI teaches you how to use AI.

During this month of development, two parallel learning curves unfolded simultaneously.

On the engineering side, you learn that the model needs:

Fresh and precise context (not too much, not just anything).
Explicit constraints so it doesn't drift.
Regular summaries to avoid "forgetting" decisions made 2 hours prior.
A clear vision of what will be built to ensure clean modularity.

On the user side, you end up applying the exact same discipline to yourself:

Summarize the session before resuming it.
Challenge every answer instead of trusting blindly.
Know how to spot when the session is drifting, when the answers become hallucinated or outdated, and that it's time to start fresh.

"By building an agent that must never lose the thread, I finally understood why I myself lost the thread when using an AI."

Of course, great resources exist to train yourself, but the instinct when facing a derailing session is only truly acquired by building.

Models are Lazy by Design

We need to clearly separate the "Architect AI" (Gemini, who I coded with) from the "Worker AI" (the local Gemma e4b / 26b model that I integrated into Vibrisse).

If the Architect AI is brilliant at generating code, the local Worker AI is lazy by design. Without constraints, an LLM takes the path of least resistance. It doesn't look for the best solution; it looks for an acceptable solution.

The concrete discovery: if you leave a 7B model without strict guardrails, it will eventually write // ... rest of the code here at 3 AM. But beware, this is also true for Cloud models! Especially when the context window gets saturated. Coupled with their natural accommodation, this laziness means you can quickly let the AI move forward without you until you lose the thread.

The answer to this laziness is ultra-structured prompts. Experience remains irreplaceable — not to do the work instead of the AI, but to know exactly when the AI is failing.

(In the next article, 5b, I'll explain exactly how we solved this problem with robust 3-layer parsing. Stick around.)

The Critical Importance of UX/UI

Another crucial lesson: UX and UI are not optional when creating an agent, especially locally where responses can be less "instantaneous" than on the Cloud.

You have to give maximum feedback to the user. Every action must have a visible reaction, otherwise, you think the agent crashed. Creating a feeling of fluidity, caring for reading comfort, handling errors elegantly... Building a good interface (like the interactive Thought Graph I implemented in Vibrisse) is compensating for the mechanical limits of AI through user experience.
But it's also about rethinking the interaction: the ultimate goal of an agent isn't to be another chatbot next to your IDE. The goal is for it to become invisible, integrated into your workflow (what I call "Ghost Mode").

The State of the Profession: Neither Dead Nor Unchanged

Are developers going to disappear? No. But the profession is mutating.

We are moving out of the euphoria phase to enter the maturity phase. AI produces more code, which leads to more complex systems, which in turn creates a massive need for architect developers. It's the Jevons Paradox applied to code: the more efficient we make code production, the more the demand for complex systems explodes.

The new developer profile isn't the one who types the fastest. It's the one who knows how to orchestrate, challenge, and validate.

Conclusion: AI as a Tool, Not Magic

Let's answer the ambient noise honestly. To those who claim: "I coded my SaaS in 2 days, devs are dead":

"Maybe. But you haven't pressed the button that breaks everything yet."

Generating a CRUD with an AI is fast. Building a production system that manages context reliably, that doesn't hallucinate on critical data, and that holds up when the model's behavior changes — that's another story. There are so many things to think about that only experience brings: security, error handling, performance optimization, machine resource management (RAM/VRAM)...

I'm not saying AI isn't useful for non-tech profiles. On the contrary, it's fantastic for prototyping an idea. But for production, you need solid knowledge.

For senior profiles: it's an incredible leverage tool.
For junior profiles: whatever you do, don't stop learning how to code. AI is piloted, it's not magic.

Paradoxically, this field experience gave me more respect for the teams building models like Gemini, Claude, and GPT. Because I saw, on my tiny scale on 32 GB of RAM, what it takes to make an LLM somewhat reliable. The gap between a local personal project and a consumer system that serves millions without failing is titanic.

This experience forged a new technical conviction that I apply today: Small Models, Great Tools.

In the next article (3b), we'll open the hood to see exactly the architecture (LangGraph, Parsing, MCP) that makes this phrase real.

Your turn:

Vibrisse Agent is public on GitHub
This project isn't "finished". It's a milestone in a living experiment that will continue to evolve. Test it, break it, improve it with me.
What broke first in your AI-assisted stack — and did an AI help you fix it, or did you have to do it yourself? Let me know in the comments.

Proudly developed in Beauce, Québec 🇨🇦. Interested in local AI sovereignty? Let's connect via Vibrisse Studio!

DEV Community