Wanda

Posted on Mar 19 • Originally published at apidog.com

What is OpenViking ?

#agents #ai #opensource #rag

TL;DR

OpenViking is an open-source context database for AI agents that replaces flat vector storage with a filesystem paradigm. It organizes context (memories, resources, skills) under viking:// URIs with three layers: L0 (~100 tokens), L1 (~2k tokens), L2 (full content). Benchmarks show 91% token cost reduction and 43% better task completion versus traditional RAG.

Try Apidog today

Introduction

Your AI agent keeps forgetting things. It asks for the same API endpoint twice, ignores your staging environment preference, and loses track of past test results.

This is common in agent development. Most teams cobble together RAG pipelines, vector databases, and custom memory systems—leading to fragmented context, high token costs, and unreliable retrieval.

Benchmarks using the LoCoMo10 dataset show traditional RAG systems achieve only 35-44% task completion rates while consuming 24-51 million input tokens.

OpenViking takes a different path. Built by ByteDance’s OpenViking team, it uses a filesystem approach where all context is organized under viking:// URIs with hierarchical L0/L1/L2 loading. Result: 52% task completion with 91% fewer tokens.

💡 Apidog users building API testing agents can integrate OpenViking to maintain conversation context across test runs, remember user environment preferences, and store API documentation for semantic retrieval.

In this guide, you'll learn how OpenViking addresses context fragmentation, see the L0/L1/L2 model in action, and deploy your first server in under 15 minutes.

The Agent Context Problem

AI agents face unique context management challenges:

For an API testing assistant, it must track:

User preferences (“staging environment”, “curl over Python”)
Project context (endpoints, auth methods, past test results)
Tool patterns (which endpoints fail, common schema errors)
Task history (what was tested, which bugs surfaced)

Traditional RAG stores all this as flat chunks in a vector database. You query it and get top-K similar fragments, lacking structure and hierarchy, with no insight into what was missed.

Five Core Challenges

OpenViking targets five core context management issues:

Challenge	Traditional RAG	OpenViking Solution
Fragmented Context	Memories, resources, skills stored separately	Unified filesystem paradigm under `viking://`
Surging Demand	Long tasks generate massive context	L0/L1/L2 hierarchical loading reduces tokens 91%
Poor Retrieval	Flat vector search lacks global view	Directory recursive retrieval with intent analysis
Unobservable	Black box retrieval chains	Visualized search trajectories for debugging
Limited Iteration	Only user interaction history	Automatic session management with 6 memory categories

OpenViking shifts from “store everything, retrieve vaguely” to “structure everything, retrieve precisely.”

What Is OpenViking?

OpenViking is an open-source context database for AI agents, licensed under Apache 2.0.

It unifies all context into a virtual filesystem. Memories, resources, and skills are mapped to directories under viking://, each with a URI.

viking://
├── resources/              # External knowledge: docs, code, web pages
│   ├── my_project/
│   │   ├── docs/
│   │   │   ├── api/
│   │   │   └── tutorials/
│   │   └── src/
│   └── ...
├── user/                   # User-specific: preferences, habits
│   └── memories/
│       ├── preferences/
│       │   ├── writing_style
│       │   └── coding_habits
│       └── ...
└── agent/                  # Agent capabilities: skills, task memories
    ├── skills/
    │   ├── search_code
    │   ├── analyze_data
    │   └── ...
    ├── memories/
    └── instructions/

Agents can:

List directories: ls viking://resources/my_project/docs/
Semantic search: find "authentication methods"
Read content: read viking://resources/docs/auth.md
Get summaries: abstract viking://resources/docs/

Core Feature 1: Filesystem Management Paradigm

OpenViking solves context fragmentation by unifying all context types under one model.

Three Context Types

Type	Purpose	Lifecycle	Initiative
Resource	External knowledge (docs, code, FAQs)	Long-term, static	User adds
Memory	Agent’s cognition (preferences, experiences)	Long-term, dynamic	Agent extracts
Skill	Callable capabilities (tools, MCP)	Long-term, static	Agent invokes

Each type lives in its directory:

viking://resources/: Product manuals, code repositories, docs
viking://user/memories/: Preferences, entity memories, events
viking://agent/skills/: Tool definitions, MCP configs
viking://agent/memories/: Learned patterns, case studies

Unix-like API

OpenViking provides familiar command-line operations:

from openviking import OpenViking

client = OpenViking(path="./data")

# Semantic search
results = client.find("user authentication")

# List directory contents
contents = client.ls("viking://resources/")

# Read full content
doc = client.read("viking://resources/docs/auth.md")

# Get L0 summary
abstract = client.abstract("viking://resources/docs/")

# Get L1 overview
overview = client.overview("viking://resources/docs/")

The API works via Python SDK or HTTP server, compatible with any agent framework.

Core Feature 2: L0/L1/L2 Hierarchical Context Loading

Stuffing all context into prompts is inefficient. OpenViking processes context into three layers:

Layer	Name	File	Token Limit	Purpose
L0	Abstract	`.abstract.md`	~100 tokens	Vector search, quick filtering
L1	Overview	`.overview.md`	~2k tokens	Rerank, content navigation
L2	Detail	Original files	Unlimited	Full content, on-demand loading

How It Works

When adding a resource (e.g., PDF):

Parse document into text
Build directory tree in AGFS storage
Queue semantic processing
Generate L0 abstracts and L1 overviews bottom-up

Example structure:

viking://resources/my_project/
├── .abstract.md          # L0
├── .overview.md          # L1
├── docs/
│   ├── .abstract.md
│   ├── .overview.md
│   ├── auth.md           # L2
│   ├── endpoints.md
│   └── rate-limits.md
└── src/
    └── ...

Token Budget Impact

This approach saves tokens:

# Traditional RAG: load all (50k tokens)
full_docs = retrieve_all("authentication")

# OpenViking: L1 (2k tokens), L2 only if needed
overview = client.overview("viking://resources/docs/auth/")
if needs_more_detail(overview):
    content = client.read("viking://resources/docs/auth/oauth.md")

Benchmarks: 91% lower input token cost, 43% better task completion.

Core Feature 3: Directory Recursive Retrieval

Single vector search struggles with complex queries. OpenViking uses directory recursive retrieval:

Five-Step Process

1. Intent Analysis
2. Initial Positioning (high-score directories)
3. Refined Exploration (within directories)
4. Recursive Descent (subdirectories)
5. Result Aggregation (ranked contexts)

Intent Analysis: Finds query type, key entities, expected content.
Initial Positioning: Vector search locates high-score directories.
Refined Exploration: Searches within top directories for files.
Recursive Descent: Repeats process in subdirectories.
Result Aggregation: Aggregates and ranks results, preserves retrieval traces.

This approach increases accuracy by leveraging context hierarchy.

Core Feature 4: Visualized Retrieval Traces

Traditional RAG is a black box. OpenViking provides observable retrieval traces:

Retrieval Trace for query: "OAuth token refresh"

├── viking://resources/docs/
│   ├── [SCORE: 0.45] .abstract.md: skipped
│   └── [SCORE: 0.89] auth/: selected
│       ├── [SCORE: 0.92] oauth.md: RETURNED
│       ├── [SCORE: 0.34] jwt.md: skipped
│       └── [SCORE: 0.78] providers/
│           └── [SCORE: 0.85] google.md: RETURNED

This enables debugging by showing which directories/files were visited and why.

Core Feature 5: Automatic Session Management

OpenViking includes a memory self-iteration loop. At session end, it extracts and updates agent knowledge automatically.

Six Memory Categories

Category	Owner	Location	Description	Update Strategy
profile	user	`user/memories/.overview.md`	Basic user info	Appendable
preferences	user	`user/memories/preferences/`	Preferences by topic	Appendable
entities	user	`user/memories/entities/`	People, projects	Appendable
events	user	`user/memories/events/`	Decisions, milestones	No update
cases	agent	`agent/memories/cases/`	Learned cases	No update
patterns	agent	`agent/memories/patterns/`	Learned patterns	No update

How Memory Extraction Works

session = client.session()

# Add conversation messages
await session.add_message("user", [{"type": "text", "text": "I prefer dark mode in the UI"}])
await session.add_message("assistant", [{"type": "text", "text": "Got it. I'll use dark mode for all future screenshots."}])

# Record tool usage
await session.add_usage({"tool": "screenshot", "parameters": {"theme": "dark"}, "result": "success"})

# Commit triggers memory extraction
await session.commit()

On commit, OpenViking compresses the session, extracts memories via LLM, updates directories, and generates new L0/L1 summaries—enabling agents to learn and adapt.

Architecture Overview

OpenViking separates concerns across multiple layers.

Dual-Layer Storage

Layer	Technology	Stores
AGFS	Custom filesystem	L0/L1/L2 content, multimedia, relations
Vector Index	Vector DB	URIs, embeddings, metadata

Content reads come from AGFS.
Vector index only stores references, not content.
No large text duplication in vector storage.

Quick Start: Deploy Your First OpenViking Server

Prerequisites

Python: 3.10+
Go: 1.22+ (for AGFS)
C++ Compiler: GCC 9+ or Clang 11+
OS: Linux, macOS, Windows

Step 1: Install OpenViking

pip install openviking --upgrade --force-reinstall

Optional: Install Rust CLI

curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/crates/ov_cli/install.sh | bash

Step 2: Configure Models

Create ~/.openviking/ov.conf:

{
  "storage": {
    "workspace": "/home/your-name/openviking_workspace"
  },
  "log": {
    "level": "INFO",
    "output": "stdout"
  },
  "embedding": {
    "dense": {
      "api_base": "https://api.openai.com/v1",
      "api_key": "your-openai-api-key",
      "provider": "openai",
      "dimension": 3072,
      "model": "text-embedding-3-large"
    },
    "max_concurrent": 10
  },
  "vlm": {
    "api_base": "https://api.openai.com/v1",
    "api_key": "your-openai-api-key",
    "provider": "openai",
    "model": "gpt-4o",
    "max_concurrent": 100
  }
}

Provider	Embedding Models	VLM Models
volcengine	doubao-embedding-vision	doubao-seed-2.0-pro
openai	text-embedding-3-large	gpt-4o, gpt-4-vision
litellm	Via LiteLLM proxy	Claude, Gemini, DeepSeek, etc.

LiteLLM enables support for Anthropic, Google, local Ollama, or OpenAI-compatible endpoints.

Step 3: Start the Server

openviking-server

Or background mode:

nohup openviking-server > /data/log/openviking.log 2>&1 &

Step 4: Add Your First Resource

# Rust CLI
ov add-resource https://docs.example.com/api-guide.pdf

# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")
client.add_resource("https://docs.example.com/api-guide.pdf")

Step 5: Search and Retrieve

# Semantic search
ov find "authentication methods"

# List contents
ov ls viking://resources/

# View tree
ov tree viking://resources/docs -L 2

# Grep for content
ov grep "OAuth" --uri viking://resources/docs/

Step 6: Enable VikingBot (Optional)

pip install "openviking[bot]"

# Start server with bot enabled
openviking-server --with-bot

# Start chat in another terminal
ov chat

Performance Benchmarks

OpenViking was benchmarked vs. traditional RAG (LanceDB) and native memory systems on the LoCoMo10 dataset (1,540 long-range dialogue cases).

Task Completion Rates

System	Completion Rate	Input Tokens
OpenClaw (native memory)	35.65%	24.6M
OpenClaw + LanceDB	44.55%	51.6M
OpenClaw + OpenViking	52.08%	4.3M

43% improvement over native memory, 91% token reduction
17% improvement over LanceDB, 92% token reduction
Hierarchical retrieval increases relevance and reduces cost

Integrating OpenViking with Apidog

Apidog users can leverage OpenViking to maintain context, store API docs, and remember preferences.

Step 1: Set Up OpenViking Server

Deploy OpenViking as outlined above with your preferred model providers.

Step 2: Import Apidog API Documentation

ov add-resource https://docs.apidog.com/overview?utm_source=dev.to&utm_medium=wanda&utm_content=n8n-post-automation
ov add-resource https://docs.apidog.com/api-testing?utm_source=dev.to&utm_medium=wanda&utm_content=n8n-post-automation

Imports Apidog docs into viking://resources/ with L0/L1/L2 processing.

Step 3: Store User Preferences

from openviking import OpenViking

client = OpenViking(path="./apidog-agent-data")
session = client.session()

# Record user's default environment
await session.add_message("user", [{
    "type": "text",
    "text": "Always use the staging environment for API tests"
}])
await session.commit()  # Extracts preference memory

Step 4: Query Context During Testing

# Find API endpoints
results = client.find("authentication endpoints")
for ctx in results.resources:
    print(f"Found: {ctx.uri}")

# Retrieve user environment preference
prefs = client.find("staging environment preference", target_uri="viking://user/memories/")

Step 5: Connect to Your Agent Framework

OpenViking exposes Python SDK and HTTP API:

# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")

# HTTP API
import httpx
response = httpx.post(
    "http://localhost:1933/api/v1/search/find",
    json={"query": "authentication endpoints"},
    headers={"X-API-Key": "your-api-key"}
)

Advanced Techniques & Best Practices

Pro Tips for Production Deployments

1. Pre-warm Frequently Accessed Context

ov add-resource https://docs.example.com --wait

2. Implement Context Expiration

await session.archive(max_age_days=7)

3. Monitor Vector Index Health

ov debug stats

Common Mistakes to Avoid

Loading L2 content prematurely—start with L0/L1.
Skipping session commits—memory extraction only happens on commit.
Overloading directories—split large resources into subdirectories.
Ignoring retrieval traces—use traces to debug results.

Performance Optimization

Scenario	Recommendation
High query volume	Run as HTTP server, use connection pooling
Large documents	Split into topic-based chunks before import
Low latency needs	Pre-generate L0/L1 for hot content
Multi-tenant setup	Separate workspaces per tenant

Security Best Practices

Store API keys in environment variables or secret managers.
Enable HTTPS for all HTTP deployments.
Implement rate limiting on public endpoints.
Use separate API keys for dev and prod.

Real-World Use Cases

1. AI Coding Assistants

Navigates project structure via viking://resources/my_project/src/
Remembers user coding preferences
Retrieves API docs during code generation

Result: 67% reduction in forgetful behaviors, 43% token cost savings.

2. Customer Support Agents

Product documentation in viking://resources/product/
Conversation history in viking://user/memories/past_issues/
Support playbooks as skills

Result: First-contact resolution up from 52% to 71%.

3. Research Assistants

Papers categorized by topic
Research methods stored as skills
Key findings extracted into memory

Result: Finding relevant papers 3x faster.

Alternatives & Comparisons

OpenViking vs. Traditional Vector Databases

Aspect	Traditional RAG (Pinecone, LanceDB)	OpenViking
Storage Model	Flat vector chunks	Hierarchical filesystem
Retrieval	Top-K similarity	Directory recursive + intent
Observability	Black box	Visualized search traces
Token Efficiency	Load all or truncate	L0/L1/L2 progressive loading
Memory Iteration	Manual or none	Automatic session management
Context Types	Documents only	Resources, memories, skills
Debugging	Guesswork	Directory traversal logs

OpenViking vs. LangChain Memory

Aspect	LangChain Memory	OpenViking
Persistence	Conversation buffer only	Full filesystem, L0/L1/L2
Scalability	Limited by context window	Hierarchical loading, no cap
Retrieval	Linear search	Directory recursive + semantic
Memory Types	Single buffer	6 categories

When to Consider Alternatives

Use traditional vector DBs if:

You need sub-100ms retrieval latency
Use case is simple keyword search
Existing RAG pipeline works fine

Use OpenViking if:

Building long-running agent conversations
Need multi-type context (docs + preferences + tools)
Token cost optimization matters
Require observable, debuggable retrieval

Comparison with Traditional RAG

Aspect	Traditional RAG	OpenViking
Storage Model	Flat vector chunks	Hierarchical filesystem
Retrieval	Top-K similarity	Directory recursive + intent
Observability	Black box	Visualized search traces
Token Efficiency	Load all or truncate	L0/L1/L2 progressive loading
Memory Iteration	Manual or none	Automatic session management
Context Types	Documents only	Resources, memories, skills
Debugging	Guesswork	Directory traversal logs

Production Deployment

For production, run OpenViking as a standalone HTTP service.

Recommended Infrastructure

Cloud: Volcengine ECS (or similar)
OS: veLinux or Ubuntu 22.04+
Storage: SSD-backed AGFS volume
Network: Low-latency to model APIs

Security Considerations

Store API keys in env vars or secret manager
Enable authentication for HTTP endpoints
Use HTTPS for all communication
Implement rate limiting

Monitoring

Configure logging:

{
  "log": {
    "level": "INFO",
    "output": "file",
    "path": "/var/log/openviking/server.log"
  }
}

Monitor:

Semantic processing queue depth
Vector search latency
AGFS read/write operations
Memory extraction success rates

Limitations and Considerations

Current Limitations

Python-centric: Primary SDK is Python; others use HTTP API.
Model dependencies: Requires external VLM and embedding models.
Learning curve: Filesystem paradigm differs from traditional DBs.
Early stage: Project active development; APIs may change.

When to Use OpenViking

Good fit:

Long-running agent conversations
Multi-type context needs
Need observable, debuggable retrieval
Token cost matters

Consider alternatives:

Simple Q&A apps
No pain points in current RAG setup
Need sub-100ms retrieval latency

The Road Ahead

OpenViking is early-stage (v0.1.x, early 2025). Planned roadmap:

Multi-tenant support
Advanced analytics and dashboards
Plugin ecosystem for agent frameworks
Edge deployment (local-first)
Enhanced MCP protocol integration

The team is seeking community contributors—project is open source under Apache 2.0.

Conclusion

OpenViking redefines AI agent context management. By organizing information as a filesystem, it eliminates fragmentation, token waste, and black-box retrieval common in traditional RAG.

Key Takeaways

Filesystem paradigm unifies context: All memories, resources, skills under viking:// URIs.
L0/L1/L2 loading cuts tokens 91%: Progressive loading, not dumping everything into prompts.
Directory recursive retrieval boosts accuracy: Focus on high-score directories, then drill down.
Visualized traces enable debugging: See exactly which retrieval paths were taken.
Automatic session management enables learning: Agents extract and update memories continuously.