Honcho: AI Agent Memory Beyond RAG
Honcho is an open-source memory library that gives AI agents reasoning-based long-term memory, achieving SOTA on BEAM benchmarks while cutting costs by 60%.
The "First Day on the Job" Problem
Every AI agent developer knows this pain: your agent forgets everything between sessions. Current solutions store conversation history and retrieve it via similarity search (RAG), but there's a fundamental gap — they find what users said, not what users need.
Memory as Reasoning
Honcho by Plastic Labs takes a different approach entirely:
| Aspect | Traditional RAG Memory | Honcho |
|---|---|---|
| Core operation | Embed → similarity search | Custom reasoning model analyzes patterns |
| Understanding | "What did the user say?" | "What does the user actually need?" |
| Over time | Static snapshot accumulation | Evolving Representations |
| Output | Related conversation chunks | Conclusions, patterns, hypotheses |
The "Dreaming" feature runs background reasoning even between conversations — organizing information, discovering patterns, and drawing conclusions.
BEAM Benchmark Results (SOTA)
BEAM 100K: 0.630 (previous best: 0.358, +76%)
BEAM 500K: 0.649 (SOTA)
BEAM 1M: 0.631 (beyond context window reasoning)
BEAM 10M: 0.409 (10M token reasoning — no LLM can do this alone)
LongMem S: 90.4% at 5% token efficiency
Architecture: Peer Paradigm
Workspace (app-level isolation)
└── Peer (any entity: user, agent, NPC, group)
├── Session (interaction thread)
│ └── Message (triggers reasoning)
└── Collection/Document (RAG vector data)
The key insight: users and agents are both Peers — symmetric entities that evolve over time. This enables natural agent-to-agent memory sharing, group conversations, and NPC memory in games.
Quick Start
pip install honcho-ai
from honcho import Honcho
honcho = Honcho(api_key="your_key")
# Create workspace and peer
workspace = honcho.workspaces.create(name="my-app")
user = honcho.peers.create(
workspace_id=workspace.id,
name="user-123"
)
# Ask about a user using reasoning (not keyword search)
insight = honcho.chat(
workspace_id=workspace.id,
peer_id=user.id,
query="What is this user's learning style?"
)
Key APIs
- Chat API (Dialectic): Natural language queries about Peers → reasoning-based answers
- Context API: Token-optimized session context within limits
- Search: Hybrid search at Workspace/Session/Peer levels
- Representation: Low-latency static insight documents
- Dreaming: Background reasoning engine
- Continual Learning: Tracks entity changes over time
Cost Efficiency
| Scenario | Direct LLM | With Honcho | Savings |
|---|---|---|---|
| LongMem S (Gemini 3 Pro) | $115 | $47 | 60% |
| 250K token history (GPT-5-Pro) | $3.75 | $0.15 | 96% |
| 10M token inbox (Claude Opus 4.5) | $50+ | $6 | 88% |
Pricing: $0.01-$0.50/query depending on reasoning depth. $100 free credits for new signups.
Links
- GitHub: plastic-labs/honcho (1,800+ stars, AGPL-3.0)
- Docs: docs.honcho.dev
- Benchmarks: Benchmarking Honcho
$5.35M Pre-Seed backed by Variant, White Star Capital, and Mozilla Ventures.
Originally published on qjc.app
Top comments (0)