DEV Community

정상록
정상록

Posted on

Honcho: AI Agent Memory Beyond RAG — Reasoning-Based Memory That Actually Understands Users

Honcho: AI Agent Memory Beyond RAG

Honcho is an open-source memory library that gives AI agents reasoning-based long-term memory, achieving SOTA on BEAM benchmarks while cutting costs by 60%.

The "First Day on the Job" Problem

Every AI agent developer knows this pain: your agent forgets everything between sessions. Current solutions store conversation history and retrieve it via similarity search (RAG), but there's a fundamental gap — they find what users said, not what users need.

Memory as Reasoning

Honcho by Plastic Labs takes a different approach entirely:

Aspect Traditional RAG Memory Honcho
Core operation Embed → similarity search Custom reasoning model analyzes patterns
Understanding "What did the user say?" "What does the user actually need?"
Over time Static snapshot accumulation Evolving Representations
Output Related conversation chunks Conclusions, patterns, hypotheses

The "Dreaming" feature runs background reasoning even between conversations — organizing information, discovering patterns, and drawing conclusions.

BEAM Benchmark Results (SOTA)

BEAM 100K:  0.630 (previous best: 0.358, +76%)
BEAM 500K:  0.649 (SOTA)
BEAM 1M:    0.631 (beyond context window reasoning)
BEAM 10M:   0.409 (10M token reasoning — no LLM can do this alone)
LongMem S:  90.4% at 5% token efficiency
Enter fullscreen mode Exit fullscreen mode

Architecture: Peer Paradigm

Workspace (app-level isolation)
  └── Peer (any entity: user, agent, NPC, group)
        ├── Session (interaction thread)
        │     └── Message (triggers reasoning)
        └── Collection/Document (RAG vector data)
Enter fullscreen mode Exit fullscreen mode

The key insight: users and agents are both Peers — symmetric entities that evolve over time. This enables natural agent-to-agent memory sharing, group conversations, and NPC memory in games.

Quick Start

pip install honcho-ai
Enter fullscreen mode Exit fullscreen mode
from honcho import Honcho

honcho = Honcho(api_key="your_key")

# Create workspace and peer
workspace = honcho.workspaces.create(name="my-app")
user = honcho.peers.create(
    workspace_id=workspace.id,
    name="user-123"
)

# Ask about a user using reasoning (not keyword search)
insight = honcho.chat(
    workspace_id=workspace.id,
    peer_id=user.id,
    query="What is this user's learning style?"
)
Enter fullscreen mode Exit fullscreen mode

Key APIs

  • Chat API (Dialectic): Natural language queries about Peers → reasoning-based answers
  • Context API: Token-optimized session context within limits
  • Search: Hybrid search at Workspace/Session/Peer levels
  • Representation: Low-latency static insight documents
  • Dreaming: Background reasoning engine
  • Continual Learning: Tracks entity changes over time

Cost Efficiency

Scenario Direct LLM With Honcho Savings
LongMem S (Gemini 3 Pro) $115 $47 60%
250K token history (GPT-5-Pro) $3.75 $0.15 96%
10M token inbox (Claude Opus 4.5) $50+ $6 88%

Pricing: $0.01-$0.50/query depending on reasoning depth. $100 free credits for new signups.

Links

$5.35M Pre-Seed backed by Variant, White Star Capital, and Mozilla Ventures.


Originally published on qjc.app

Top comments (0)