Honcho: AI Agent Memory Beyond RAG — Reasoning-Based Memory That Actually Understands Users

#opensource #ai #agents #rag

Honcho: AI Agent Memory Beyond RAG

Honcho is an open-source memory library that gives AI agents reasoning-based long-term memory, achieving SOTA on BEAM benchmarks while cutting costs by 60%.

The "First Day on the Job" Problem

Every AI agent developer knows this pain: your agent forgets everything between sessions. Current solutions store conversation history and retrieve it via similarity search (RAG), but there's a fundamental gap — they find what users said, not what users need.

Memory as Reasoning

Honcho by Plastic Labs takes a different approach entirely:

Aspect	Traditional RAG Memory	Honcho
Core operation	Embed → similarity search	Custom reasoning model analyzes patterns
Understanding	"What did the user say?"	"What does the user actually need?"
Over time	Static snapshot accumulation	Evolving Representations
Output	Related conversation chunks	Conclusions, patterns, hypotheses

The "Dreaming" feature runs background reasoning even between conversations — organizing information, discovering patterns, and drawing conclusions.

BEAM Benchmark Results (SOTA)

BEAM 100K:  0.630 (previous best: 0.358, +76%)
BEAM 500K:  0.649 (SOTA)
BEAM 1M:    0.631 (beyond context window reasoning)
BEAM 10M:   0.409 (10M token reasoning — no LLM can do this alone)
LongMem S:  90.4% at 5% token efficiency

Architecture: Peer Paradigm

Workspace (app-level isolation)
  └── Peer (any entity: user, agent, NPC, group)
        ├── Session (interaction thread)
        │     └── Message (triggers reasoning)
        └── Collection/Document (RAG vector data)

The key insight: users and agents are both Peers — symmetric entities that evolve over time. This enables natural agent-to-agent memory sharing, group conversations, and NPC memory in games.

Quick Start

pip install honcho-ai

from honcho import Honcho

honcho = Honcho(api_key="your_key")

# Create workspace and peer
workspace = honcho.workspaces.create(name="my-app")
user = honcho.peers.create(
    workspace_id=workspace.id,
    name="user-123"
)

# Ask about a user using reasoning (not keyword search)
insight = honcho.chat(
    workspace_id=workspace.id,
    peer_id=user.id,
    query="What is this user's learning style?"
)

Key APIs

Chat API (Dialectic): Natural language queries about Peers → reasoning-based answers
Context API: Token-optimized session context within limits
Search: Hybrid search at Workspace/Session/Peer levels
Representation: Low-latency static insight documents
Dreaming: Background reasoning engine
Continual Learning: Tracks entity changes over time

Cost Efficiency

Scenario	Direct LLM	With Honcho	Savings
LongMem S (Gemini 3 Pro)	$115	$47	60%
250K token history (GPT-5-Pro)	$3.75	$0.15	96%
10M token inbox (Claude Opus 4.5)	$50+	$6	88%

Pricing: $0.01-$0.50/query depending on reasoning depth. $100 free credits for new signups.

DEV Community