Last month, I got frustrated.
I was using AI agents for web scraping tasks, and they kept making the same mistakes over and over. "Search 3 websites" would return 2. "Save to CSV" would forget the file. Every. Single. Time.
So I asked myself: What if an AI agent could remember its failures and actually learn from them?
That question led me down a rabbit hole, built MOMO — a Memory-Oriented Model Orchestrator that stores mistakes in a graph database and uses them to self-correct.
And honestly? Watching it fail, learn, and succeed on retry feels like magic.
The Problem With Stateless Agents
Here's the thing about most AI agents: they're goldfish.
Every conversation starts fresh. Every task runs in isolation. That "helpful" agent that just helped you debug a React component? It has zero memory of your codebase, your preferences, or the fact that it crashed 5 minutes ago trying the exact same approach.
User: Search 3 e-commerce sites for headphones
Agent: *searches 2 sites, declares victory*
User: You only searched 2 sites
Agent: My apologies! Let me try again
Agent: *searches 2 sites again*
Sound familiar?
The root cause isn't the LLM — it's the architecture. Without persistent memory, agents can't:
- Remember what worked (and what didn't)
- Learn your preferences over time
- Avoid repeating past failures
Enter MOMO: Memory That Actually Works
MOMO is built on a simple premise: treat agent memory like a knowledge graph, not a chat log.
Instead of stuffing conversation history into context, MOMO extracts structured information and stores it in relationships:
User ──PREFERS──> Memory ──ABOUT──> Topic
│ │
└──INTERACTED──> Episode ──LEARNED_FROM──> Memory
│
Task ──MADE_MISTAKE──> MistakeNode
This means MOMO can:
- Smart recall: Surface relevant memories based on recency, importance, and keyword matching
- Contradiction detection: Notice when new info conflicts with existing knowledge
- Mistake tracking: Remember why tasks failed and how to prevent it
The Self-Improvement Loop
Here's where it gets interesting. When you prefix a task with /learn, MOMO activates validation mode:
❯ /learn search 3 tech sites for AI news and save to report.md
Behind the scenes:
- Spec Extraction — An LLM parses the task and extracts requirements:
{
"numeric_requirements": [{ "field": "sites", "value": 3 }],
"expected_outputs": ["report.md"]
}
Execution — MOMO runs the task using browser automation, APIs, whatever tools it needs
Validation — Compares actual output against the spec
Learning — If validation fails, it stores a
MistakeNode:
MistakeNode {
mistake_type: QuantityMismatch,
description: "\"Only searched 2 sites instead of 3\","
prevention_strategy: "Verify count matches requirement before completing",
severity: Major,
}
- Retry — Automatically retries with the mistake context injected into the prompt
The result? MOMO's retry prompt looks like this:
## LEARNED FROM PAST MISTAKES:
Apply these lessons to avoid repeating errors:
🚨 [QuantityMismatch] Only searched 2 sites instead of 3
Prevention: Verify site count matches requirement before completing
Now executing: search 3 tech sites for AI news...
And it works. Tasks that failed on first attempt succeed on retry because the agent has concrete, actionable feedback.
The Tech Stack
I built MOMO in Rust because:
- I wanted it fast (async everything, minimal memory footprint)
- I'm mass enough to enjoy fighting the borrow checker
- Graph databases + Rust = surprisingly ergonomic with LadybugDB
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ MOMO Agent │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Providers │ │ Tool Layer │ │ Memory System │ │
│ │ ────────── │ │ ────────── │ │ ────────────── │ │
│ │ • Anthropic │ │ • Native │ │ • GraphBrain │ │
│ │ • OpenAI │ │ • MCP │ │ • Smart Recall │ │
│ │ • Ollama │ │ • Browser │ │ • Mistake Storage │ │
│ │ • Gemini │ │ • Skills │ │ • Deduplication │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Multi-Provider Support
One config file, any LLM:
{
"provider_type": "anthropic",
"model": "claude-sonnet-4-20250514",
"max_tokens": 4096
}
Swap to local models for free experimentation:
{
"provider_type": "ollama",
"model": "llama3.2",
"max_tokens": 4096
}
Supports: Claude, GPT-4, Gemini, Ollama, DeepSeek, Groq, Together, LM Studio, and OpenRouter.
MCP Integration
MOMO uses the Model Context Protocol for tool extensibility. This means you can plug in any MCP server:
{
"servers": [
{
"name": "filesystem",
"command": "npx",
"args": ["-y", "@anthropic/mcp-server-filesystem", "/home/user"]
},
{
"name": "playwright",
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
]
}
Browser automation, file access, databases — if there's an MCP server for it, MOMO can use it.
Smart Memory Recall
Not all memories are created equal. When MOMO recalls context, it scores each memory:
score = (importance × 0.25) + (recency × 0.30) +
(frequency × 0.15) + (keyword_match × 0.30)
- Recency: Memories decay with a 7-day half-life
- Importance: Extracted during storage (0.0 to 1.0)
- Frequency: How often has this memory been accessed?
- Keyword match: Does the content match the current query?
This prevents the context window from filling up with irrelevant old memories while keeping important recent ones front and center.
Show Me The Code
Here's how mistake storage actually works:
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MistakeNode {
pub id: String,
pub task_id: String,
pub mistake_type: MistakeType,
pub severity: Severity,
pub description: String,
pub prevention_strategy: String,
pub keywords: Vec<String>,
pub task_fingerprint: String,
pub was_corrected: bool,
pub created_at: DateTime<Utc>,
}
impl GraphBrain {
pub fn store_mistake(&self, mistake: &MistakeNode) -> Result<()> {
// Store in graph with relationships
self.db.execute(cypher!(
"CREATE (m:Mistake $props)
WITH m
MATCH (t:Task {id: $task_id})
CREATE (t)-[:MADE_MISTAKE]->(m)",
props = mistake,
task_id = &mistake.task_id
))?;
info!("📚 [MISTAKE] Recorded: {}", mistake.description);
Ok(())
}
pub fn recall_similar_mistakes(&self, keywords: &[String], limit: usize)
-> Result<Vec<MistakeNode>>
{
// Find mistakes with matching keywords for similar tasks
// Inject into system prompt for retry attempts
}
}
And the learning module that ties it together:
pub async fn execute_with_learning(
&mut self,
task: &str,
max_retries: usize,
) -> Result<String> {
// 1. Extract specification
let spec = self.spec_extractor.extract(task).await?;
// 2. Recall past mistakes for similar tasks
let past_mistakes = self.brain.recall_similar_mistakes(&spec.keywords, 3)?;
// 3. Build context with mistake prevention
let context = self.learning_module.build_context(&past_mistakes);
// 4. Execute with retry loop
for attempt in 0..=max_retries {
let result = self.agent.execute_with_context(task, &context).await?;
// 5. Validate output
match self.validator.validate(&result, &spec) {
Ok(_) => {
if attempt > 0 {
self.brain.mark_mistake_corrected(&past_mistakes)?;
}
return Ok(result);
}
Err(validation_errors) => {
// 6. Store new mistakes
for error in &validation_errors {
let mistake = MistakeNode::from_validation_error(error, task);
self.brain.store_mistake(&mistake)?;
}
// 7. Retry with correction context
context.add_correction_prompt(&validation_errors);
}
}
}
Err(AgentError::MaxRetriesExceeded)
}
What I Learned Building This
1. Structured memory beats conversation history
Dumping chat logs into context is lazy and expensive. Extracting facts, preferences, and relationships gives you 10x better recall at 1/10th the tokens.
2. Validation is the key to learning
Without a spec to validate against, "learning" is just vibes. MOMO's spec extraction isn't perfect, but even rough validation catches 80% of failures.
3. Rust is great for AI infrastructure
Hot take: the AI ecosystem is too Python-heavy. Rust's async story, memory safety, and performance make it perfect for long-running agents that need to be reliable.
4. MCP is underrated
Anthropic's Model Context Protocol is quietly becoming the USB-C of AI tools. Write once, use with any agent. More people should be building MCP servers.
Try It Yourself
MOMO is open source:
GitHub: github.com/Arjxm/momo
# Clone and configure
git clone https://github.com/Arjxm/momo
cd momo
cp .env.example .env
# Add your API keys
# Build and run
cargo build --release
./target/release/momo
Basic usage:
❯ search for the latest AI news # Regular query
❯ /learn find 5 papers on transformers # With validation
❯ mistakes # View recorded mistakes
❯ stats # Graph statistics
❯ debug # Launch visualization server
What's Next?
I'm actively working on:
- Task orchestrator: Parallel workers for complex multi-step tasks
- Skill learning: Auto-generate new tools from successful patterns
- Memory consolidation: Compress and merge related memories over time
If you're interested in AI agents that actually remember, I'd love feedback. Star the repo, open an issue, or just say hi in the discussions.
And if you build something cool with MOMO, let me know — nothing makes my day like seeing what others create with this stuff.
Thanks for reading! I'm Arjun, and I build things that probably shouldn't be built in Rust but are anyway. Follow me for more AI experiments and occasional hot takes about memory systems.
Tags: #ai #rust #machinelearning #opensource #agents
Top comments (0)