Building Self-Refining AI Agents with Ollama & Langfuse

#programming #ai #javascript #ollama

Have you ever asked ChatGPT to write a report, read the output, and thought, "Eh, that's kind of generic"?

We've all been there. The standard "one-shot" approach to AI - prompt in, answer out-mimics a human blurting out the first thing that comes to mind. But real intelligence isn't just about generation; it's about refinement.

Today, I'm going to show you how to build a Self-Recursive Agent using Node.js, local LLMs (Ollama), and Langfuse. This agent doesn't just write; it crtiques its own work, fixes mistakes, and improves recursively until it meets a high quality standard.

Why Recursion > Fine-Tuning

If you want better model outputs, the industry standard advice is often "Fine-tune a model on your data!"

Fine-tuning is powerful, but it's also:

Expensive: Requires GPUs and compute.
Rigid: The model only learns what you show it.
Hard to Debug: If it fails, you don't know why.

Self-Recursion is the alternative. Instead of training a smarter model, you build a smarter workflow. You create a loop where "dumber" models can outperform "smarter" ones simply by having the chance to correct their own mistakes.

The Architecture: A Digital Newsroom

We simulate a team of experts working together:

The Planner (Llama 3.2): breaks the user's request into a JSON list of tasks.
The Executor (Mistral): writes the content.
The Critic (Phi-4): reads the draft and critiques it against a strict rubric.
The Judge (GPT-OSS): assigns a numerical score (0-1).

If the score is below 0.9, the Critic's feedback is passed back to the Executor, and the loop flows again.

The Code: The "Thinking" Loop

The magic happens inside a simple while loop in Node.js. Here is the simplified logic from our index.js:

let score = 0.0;
let attempt = 0;
// Keep refining until we hit 90% quality
while (attempt < MAX_RETRIES && score < 0.9) {

    // 1. Executor writes/rewrites
    const report = await executorModel.generate({
        tasks: plan,
        previousDraft: currentReport,
        criticFeedback: feedback // <--- The secret sauce
    });
    // 2. Critic reviews the work
    const critique = await criticModel.chat({
        prompt: `Analyze this draft for logic gaps: ${report}`
    });
    // 3. Judge evaluates
    const judgment = await judgeModel.evaluate(report, critique);
    score = judgment.score;

    // Send the score to Langfuse for tracking
    await langfuse.score({
        name: "quality-gate",
        value: score,
        comment: judgment.reasoning
    });
}

Notice how we feed criticFeedback back into the Executor? That is the recursion. The model effectively "learns" from the critique in real-time, within the context window.

Observability: Seeing the Brain Work

When you have loops inside loops, console.log doesn't cut it. You need to see the trace of execution.

This is where Langfuse comes in. It's an open-source LLM engineering platform that lets us visualize exactly what our agent is doing.

In our code, we wrap the entire process in a trace:

const trace = langfuse.trace({ name: "Recursive-Research-Agent" });
// ... later ...
const loopSpan = trace.span({ name: `Refinement_Loop_V${attempt}` });

The result? A beautiful graph that looks like this:

Trace Start
- Planner (Output: "Task List")
- Loop 1 (Score: 0.6)
  - Executor -> "Draft 1"
  - Critic -> "You missed section X"
- Loop 2 (Score: 0.8)
  - Executor -> "Draft 2 (Fixed X)"
  - Critic -> "Tone is too casual"
- Loop 3 (Score: 0.95 - PASS) ✅

Without Langfuse, you're flying blind. With it, you can pinpoint exactly which model (Critic or Executor) is dropping the ball and adjust your prompts accordingly.

You don't need a massive cluster of H100s to build powerful AI. By chaining smaller, local models together in a self-correcting loop and monitoring them with tools like Langfuse, you can achieve results that rival much larger logic engines.

The code is open source - go clone it, fire up Ollama, and watch your agent teach itself to be better!