DEV Community

Arjun
Arjun

Posted on

AI Agent That Actually Learns From Its Mistakes (and It Changed How I Think About AI)

Last month, I got frustrated.
I was using AI agents for web scraping tasks, and they kept making the same mistakes over and over. "Search 3 websites" would return 2. "Save to CSV" would forget the file. Every. Single. Time.

So I asked myself: What if an AI agent could remember its failures and actually learn from them?

That question led me down a rabbit hole, built MOMO — a Memory-Oriented Model Orchestrator that stores mistakes in a graph database and uses them to self-correct.

And honestly? Watching it fail, learn, and succeed on retry feels like magic.


The Problem With Stateless Agents

Here's the thing about most AI agents: they're goldfish.

Every conversation starts fresh. Every task runs in isolation. That "helpful" agent that just helped you debug a React component? It has zero memory of your codebase, your preferences, or the fact that it crashed 5 minutes ago trying the exact same approach.

User: Search 3 e-commerce sites for headphones
Agent: *searches 2 sites, declares victory*
User: You only searched 2 sites
Agent: My apologies! Let me try again
Agent: *searches 2 sites again*
Enter fullscreen mode Exit fullscreen mode

Sound familiar?

The root cause isn't the LLM — it's the architecture. Without persistent memory, agents can't:

  • Remember what worked (and what didn't)
  • Learn your preferences over time
  • Avoid repeating past failures

Enter MOMO: Memory That Actually Works

MOMO is built on a simple premise: treat agent memory like a knowledge graph, not a chat log.

Instead of stuffing conversation history into context, MOMO extracts structured information and stores it in relationships:

User ──PREFERS──> Memory ──ABOUT──> Topic
                    
  └──INTERACTED──> Episode ──LEARNED_FROM──> Memory
                     
                  Task ──MADE_MISTAKE──> MistakeNode
Enter fullscreen mode Exit fullscreen mode

This means MOMO can:

  • Smart recall: Surface relevant memories based on recency, importance, and keyword matching
  • Contradiction detection: Notice when new info conflicts with existing knowledge
  • Mistake tracking: Remember why tasks failed and how to prevent it

The Self-Improvement Loop

Here's where it gets interesting. When you prefix a task with /learn, MOMO activates validation mode:

❯ /learn search 3 tech sites for AI news and save to report.md
Enter fullscreen mode Exit fullscreen mode

Behind the scenes:

  1. Spec Extraction — An LLM parses the task and extracts requirements:
   {
     "numeric_requirements": [{ "field": "sites", "value": 3 }],
     "expected_outputs": ["report.md"]
   }
Enter fullscreen mode Exit fullscreen mode
  1. Execution — MOMO runs the task using browser automation, APIs, whatever tools it needs

  2. Validation — Compares actual output against the spec

  3. Learning — If validation fails, it stores a MistakeNode:

   MistakeNode {
       mistake_type: QuantityMismatch,
       description: "\"Only searched 2 sites instead of 3\","
       prevention_strategy: "Verify count matches requirement before completing",
       severity: Major,
   }
Enter fullscreen mode Exit fullscreen mode
  1. Retry — Automatically retries with the mistake context injected into the prompt

The result? MOMO's retry prompt looks like this:

## LEARNED FROM PAST MISTAKES:
Apply these lessons to avoid repeating errors:

🚨 [QuantityMismatch] Only searched 2 sites instead of 3
   Prevention: Verify site count matches requirement before completing

Now executing: search 3 tech sites for AI news...
Enter fullscreen mode Exit fullscreen mode

And it works. Tasks that failed on first attempt succeed on retry because the agent has concrete, actionable feedback.


The Tech Stack

I built MOMO in Rust because:

  • I wanted it fast (async everything, minimal memory footprint)
  • I'm mass enough to enjoy fighting the borrow checker
  • Graph databases + Rust = surprisingly ergonomic with LadybugDB

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                         MOMO Agent                              │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │
│  │   Providers  │  │  Tool Layer  │  │    Memory System     │   │
│  │  ──────────  │  │  ──────────  │  │    ──────────────    │   │
│  │  • Anthropic │  │  • Native    │  │  • GraphBrain        │   │
│  │  • OpenAI    │  │  • MCP       │  │  • Smart Recall      │   │
│  │  • Ollama    │  │  • Browser   │  │  • Mistake Storage   │   │
│  │  • Gemini    │  │  • Skills    │  │  • Deduplication     │   │
│  └──────────────┘  └──────────────┘  └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Multi-Provider Support

One config file, any LLM:

{
  "provider_type": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 4096
}
Enter fullscreen mode Exit fullscreen mode

Swap to local models for free experimentation:

{
  "provider_type": "ollama",
  "model": "llama3.2",
  "max_tokens": 4096
}
Enter fullscreen mode Exit fullscreen mode

Supports: Claude, GPT-4, Gemini, Ollama, DeepSeek, Groq, Together, LM Studio, and OpenRouter.

MCP Integration

MOMO uses the Model Context Protocol for tool extensibility. This means you can plug in any MCP server:

{
  "servers": [
    {
      "name": "filesystem",
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-server-filesystem", "/home/user"]
    },
    {
      "name": "playwright",
      "command": "npx",
      "args": ["-y", "@playwright/mcp"]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Browser automation, file access, databases — if there's an MCP server for it, MOMO can use it.


Smart Memory Recall

Not all memories are created equal. When MOMO recalls context, it scores each memory:

score = (importance × 0.25) + (recency × 0.30) +
        (frequency × 0.15) + (keyword_match × 0.30)
Enter fullscreen mode Exit fullscreen mode
  • Recency: Memories decay with a 7-day half-life
  • Importance: Extracted during storage (0.0 to 1.0)
  • Frequency: How often has this memory been accessed?
  • Keyword match: Does the content match the current query?

This prevents the context window from filling up with irrelevant old memories while keeping important recent ones front and center.


Show Me The Code

Here's how mistake storage actually works:

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MistakeNode {
    pub id: String,
    pub task_id: String,
    pub mistake_type: MistakeType,
    pub severity: Severity,
    pub description: String,
    pub prevention_strategy: String,
    pub keywords: Vec<String>,
    pub task_fingerprint: String,
    pub was_corrected: bool,
    pub created_at: DateTime<Utc>,
}

impl GraphBrain {
    pub fn store_mistake(&self, mistake: &MistakeNode) -> Result<()> {
        // Store in graph with relationships
        self.db.execute(cypher!(
            "CREATE (m:Mistake $props)
             WITH m
             MATCH (t:Task {id: $task_id})
             CREATE (t)-[:MADE_MISTAKE]->(m)",
            props = mistake,
            task_id = &mistake.task_id
        ))?;

        info!("📚 [MISTAKE] Recorded: {}", mistake.description);
        Ok(())
    }

    pub fn recall_similar_mistakes(&self, keywords: &[String], limit: usize)
        -> Result<Vec<MistakeNode>>
    {
        // Find mistakes with matching keywords for similar tasks
        // Inject into system prompt for retry attempts
    }
}
Enter fullscreen mode Exit fullscreen mode

And the learning module that ties it together:

pub async fn execute_with_learning(
    &mut self,
    task: &str,
    max_retries: usize,
) -> Result<String> {
    // 1. Extract specification
    let spec = self.spec_extractor.extract(task).await?;

    // 2. Recall past mistakes for similar tasks
    let past_mistakes = self.brain.recall_similar_mistakes(&spec.keywords, 3)?;

    // 3. Build context with mistake prevention
    let context = self.learning_module.build_context(&past_mistakes);

    // 4. Execute with retry loop
    for attempt in 0..=max_retries {
        let result = self.agent.execute_with_context(task, &context).await?;

        // 5. Validate output
        match self.validator.validate(&result, &spec) {
            Ok(_) => {
                if attempt > 0 {
                    self.brain.mark_mistake_corrected(&past_mistakes)?;
                }
                return Ok(result);
            }
            Err(validation_errors) => {
                // 6. Store new mistakes
                for error in &validation_errors {
                    let mistake = MistakeNode::from_validation_error(error, task);
                    self.brain.store_mistake(&mistake)?;
                }

                // 7. Retry with correction context
                context.add_correction_prompt(&validation_errors);
            }
        }
    }

    Err(AgentError::MaxRetriesExceeded)
}
Enter fullscreen mode Exit fullscreen mode

What I Learned Building This

1. Structured memory beats conversation history

Dumping chat logs into context is lazy and expensive. Extracting facts, preferences, and relationships gives you 10x better recall at 1/10th the tokens.

2. Validation is the key to learning

Without a spec to validate against, "learning" is just vibes. MOMO's spec extraction isn't perfect, but even rough validation catches 80% of failures.

3. Rust is great for AI infrastructure

Hot take: the AI ecosystem is too Python-heavy. Rust's async story, memory safety, and performance make it perfect for long-running agents that need to be reliable.

4. MCP is underrated

Anthropic's Model Context Protocol is quietly becoming the USB-C of AI tools. Write once, use with any agent. More people should be building MCP servers.


Try It Yourself

MOMO is open source:

GitHub: github.com/Arjxm/momo

# Clone and configure
git clone https://github.com/Arjxm/momo
cd momo
cp .env.example .env
# Add your API keys

# Build and run
cargo build --release
./target/release/momo
Enter fullscreen mode Exit fullscreen mode

Basic usage:

❯ search for the latest AI news              # Regular query
❯ /learn find 5 papers on transformers       # With validation
❯ mistakes                                   # View recorded mistakes
❯ stats                                      # Graph statistics
❯ debug                                      # Launch visualization server
Enter fullscreen mode Exit fullscreen mode

What's Next?

I'm actively working on:

  • Task orchestrator: Parallel workers for complex multi-step tasks
  • Skill learning: Auto-generate new tools from successful patterns
  • Memory consolidation: Compress and merge related memories over time

If you're interested in AI agents that actually remember, I'd love feedback. Star the repo, open an issue, or just say hi in the discussions.

And if you build something cool with MOMO, let me know — nothing makes my day like seeing what others create with this stuff.


Thanks for reading! I'm Arjun, and I build things that probably shouldn't be built in Rust but are anyway. Follow me for more AI experiments and occasional hot takes about memory systems.

Tags: #ai #rust #machinelearning #opensource #agents

Top comments (0)