TL;DR
OpenViking is an open-source context database for AI agents that replaces flat vector storage with a filesystem paradigm. It organizes context (memories, resources, skills) under viking:// URIs with three layers: L0 (~100 tokens), L1 (~2k tokens), L2 (full content). Benchmarks show 91% token cost reduction and 43% better task completion versus traditional RAG.
Introduction
Your AI agent keeps forgetting things. It asks for the same API endpoint twice, ignores your staging environment preference, and loses track of past test results.
This is common in agent development. Most teams cobble together RAG pipelines, vector databases, and custom memory systems—leading to fragmented context, high token costs, and unreliable retrieval.
Benchmarks using the LoCoMo10 dataset show traditional RAG systems achieve only 35-44% task completion rates while consuming 24-51 million input tokens.
OpenViking takes a different path. Built by ByteDance’s OpenViking team, it uses a filesystem approach where all context is organized under viking:// URIs with hierarchical L0/L1/L2 loading. Result: 52% task completion with 91% fewer tokens.
💡 Apidog users building API testing agents can integrate OpenViking to maintain conversation context across test runs, remember user environment preferences, and store API documentation for semantic retrieval.
In this guide, you'll learn how OpenViking addresses context fragmentation, see the L0/L1/L2 model in action, and deploy your first server in under 15 minutes.
The Agent Context Problem
AI agents face unique context management challenges:
For an API testing assistant, it must track:
- User preferences (“staging environment”, “curl over Python”)
- Project context (endpoints, auth methods, past test results)
- Tool patterns (which endpoints fail, common schema errors)
- Task history (what was tested, which bugs surfaced)
Traditional RAG stores all this as flat chunks in a vector database. You query it and get top-K similar fragments, lacking structure and hierarchy, with no insight into what was missed.
Five Core Challenges
OpenViking targets five core context management issues:
| Challenge | Traditional RAG | OpenViking Solution |
|---|---|---|
| Fragmented Context | Memories, resources, skills stored separately | Unified filesystem paradigm under viking://
|
| Surging Demand | Long tasks generate massive context | L0/L1/L2 hierarchical loading reduces tokens 91% |
| Poor Retrieval | Flat vector search lacks global view | Directory recursive retrieval with intent analysis |
| Unobservable | Black box retrieval chains | Visualized search trajectories for debugging |
| Limited Iteration | Only user interaction history | Automatic session management with 6 memory categories |
OpenViking shifts from “store everything, retrieve vaguely” to “structure everything, retrieve precisely.”
What Is OpenViking?
OpenViking is an open-source context database for AI agents, licensed under Apache 2.0.
It unifies all context into a virtual filesystem. Memories, resources, and skills are mapped to directories under viking://, each with a URI.
viking://
├── resources/ # External knowledge: docs, code, web pages
│ ├── my_project/
│ │ ├── docs/
│ │ │ ├── api/
│ │ │ └── tutorials/
│ │ └── src/
│ └── ...
├── user/ # User-specific: preferences, habits
│ └── memories/
│ ├── preferences/
│ │ ├── writing_style
│ │ └── coding_habits
│ └── ...
└── agent/ # Agent capabilities: skills, task memories
├── skills/
│ ├── search_code
│ ├── analyze_data
│ └── ...
├── memories/
└── instructions/
Agents can:
- List directories:
ls viking://resources/my_project/docs/ - Semantic search:
find "authentication methods" - Read content:
read viking://resources/docs/auth.md - Get summaries:
abstract viking://resources/docs/
Core Feature 1: Filesystem Management Paradigm
OpenViking solves context fragmentation by unifying all context types under one model.
Three Context Types
| Type | Purpose | Lifecycle | Initiative |
|---|---|---|---|
| Resource | External knowledge (docs, code, FAQs) | Long-term, static | User adds |
| Memory | Agent’s cognition (preferences, experiences) | Long-term, dynamic | Agent extracts |
| Skill | Callable capabilities (tools, MCP) | Long-term, static | Agent invokes |
Each type lives in its directory:
-
viking://resources/: Product manuals, code repositories, docs -
viking://user/memories/: Preferences, entity memories, events -
viking://agent/skills/: Tool definitions, MCP configs -
viking://agent/memories/: Learned patterns, case studies
Unix-like API
OpenViking provides familiar command-line operations:
from openviking import OpenViking
client = OpenViking(path="./data")
# Semantic search
results = client.find("user authentication")
# List directory contents
contents = client.ls("viking://resources/")
# Read full content
doc = client.read("viking://resources/docs/auth.md")
# Get L0 summary
abstract = client.abstract("viking://resources/docs/")
# Get L1 overview
overview = client.overview("viking://resources/docs/")
The API works via Python SDK or HTTP server, compatible with any agent framework.
Core Feature 2: L0/L1/L2 Hierarchical Context Loading
Stuffing all context into prompts is inefficient. OpenViking processes context into three layers:
| Layer | Name | File | Token Limit | Purpose |
|---|---|---|---|---|
| L0 | Abstract | .abstract.md |
~100 tokens | Vector search, quick filtering |
| L1 | Overview | .overview.md |
~2k tokens | Rerank, content navigation |
| L2 | Detail | Original files | Unlimited | Full content, on-demand loading |
How It Works
When adding a resource (e.g., PDF):
- Parse document into text
- Build directory tree in AGFS storage
- Queue semantic processing
- Generate L0 abstracts and L1 overviews bottom-up
Example structure:
viking://resources/my_project/
├── .abstract.md # L0
├── .overview.md # L1
├── docs/
│ ├── .abstract.md
│ ├── .overview.md
│ ├── auth.md # L2
│ ├── endpoints.md
│ └── rate-limits.md
└── src/
└── ...
Token Budget Impact
This approach saves tokens:
# Traditional RAG: load all (50k tokens)
full_docs = retrieve_all("authentication")
# OpenViking: L1 (2k tokens), L2 only if needed
overview = client.overview("viking://resources/docs/auth/")
if needs_more_detail(overview):
content = client.read("viking://resources/docs/auth/oauth.md")
Benchmarks: 91% lower input token cost, 43% better task completion.
Core Feature 3: Directory Recursive Retrieval
Single vector search struggles with complex queries. OpenViking uses directory recursive retrieval:
Five-Step Process
1. Intent Analysis
2. Initial Positioning (high-score directories)
3. Refined Exploration (within directories)
4. Recursive Descent (subdirectories)
5. Result Aggregation (ranked contexts)
- Intent Analysis: Finds query type, key entities, expected content.
- Initial Positioning: Vector search locates high-score directories.
- Refined Exploration: Searches within top directories for files.
- Recursive Descent: Repeats process in subdirectories.
- Result Aggregation: Aggregates and ranks results, preserves retrieval traces.
This approach increases accuracy by leveraging context hierarchy.
Core Feature 4: Visualized Retrieval Traces
Traditional RAG is a black box. OpenViking provides observable retrieval traces:
Retrieval Trace for query: "OAuth token refresh"
├── viking://resources/docs/
│ ├── [SCORE: 0.45] .abstract.md: skipped
│ └── [SCORE: 0.89] auth/: selected
│ ├── [SCORE: 0.92] oauth.md: RETURNED
│ ├── [SCORE: 0.34] jwt.md: skipped
│ └── [SCORE: 0.78] providers/
│ └── [SCORE: 0.85] google.md: RETURNED
This enables debugging by showing which directories/files were visited and why.
Core Feature 5: Automatic Session Management
OpenViking includes a memory self-iteration loop. At session end, it extracts and updates agent knowledge automatically.
Six Memory Categories
| Category | Owner | Location | Description | Update Strategy |
|---|---|---|---|---|
| profile | user | user/memories/.overview.md |
Basic user info | Appendable |
| preferences | user | user/memories/preferences/ |
Preferences by topic | Appendable |
| entities | user | user/memories/entities/ |
People, projects | Appendable |
| events | user | user/memories/events/ |
Decisions, milestones | No update |
| cases | agent | agent/memories/cases/ |
Learned cases | No update |
| patterns | agent | agent/memories/patterns/ |
Learned patterns | No update |
How Memory Extraction Works
session = client.session()
# Add conversation messages
await session.add_message("user", [{"type": "text", "text": "I prefer dark mode in the UI"}])
await session.add_message("assistant", [{"type": "text", "text": "Got it. I'll use dark mode for all future screenshots."}])
# Record tool usage
await session.add_usage({"tool": "screenshot", "parameters": {"theme": "dark"}, "result": "success"})
# Commit triggers memory extraction
await session.commit()
On commit, OpenViking compresses the session, extracts memories via LLM, updates directories, and generates new L0/L1 summaries—enabling agents to learn and adapt.
Architecture Overview
OpenViking separates concerns across multiple layers.
Dual-Layer Storage
| Layer | Technology | Stores |
|---|---|---|
| AGFS | Custom filesystem | L0/L1/L2 content, multimedia, relations |
| Vector Index | Vector DB | URIs, embeddings, metadata |
- Content reads come from AGFS.
- Vector index only stores references, not content.
- No large text duplication in vector storage.
Quick Start: Deploy Your First OpenViking Server
Prerequisites
- Python: 3.10+
- Go: 1.22+ (for AGFS)
- C++ Compiler: GCC 9+ or Clang 11+
- OS: Linux, macOS, Windows
Step 1: Install OpenViking
pip install openviking --upgrade --force-reinstall
Optional: Install Rust CLI
curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/crates/ov_cli/install.sh | bash
Step 2: Configure Models
Create ~/.openviking/ov.conf:
{
"storage": {
"workspace": "/home/your-name/openviking_workspace"
},
"log": {
"level": "INFO",
"output": "stdout"
},
"embedding": {
"dense": {
"api_base": "https://api.openai.com/v1",
"api_key": "your-openai-api-key",
"provider": "openai",
"dimension": 3072,
"model": "text-embedding-3-large"
},
"max_concurrent": 10
},
"vlm": {
"api_base": "https://api.openai.com/v1",
"api_key": "your-openai-api-key",
"provider": "openai",
"model": "gpt-4o",
"max_concurrent": 100
}
}
| Provider | Embedding Models | VLM Models |
|---|---|---|
| volcengine | doubao-embedding-vision | doubao-seed-2.0-pro |
| openai | text-embedding-3-large | gpt-4o, gpt-4-vision |
| litellm | Via LiteLLM proxy | Claude, Gemini, DeepSeek, etc. |
LiteLLM enables support for Anthropic, Google, local Ollama, or OpenAI-compatible endpoints.
Step 3: Start the Server
openviking-server
Or background mode:
nohup openviking-server > /data/log/openviking.log 2>&1 &
Step 4: Add Your First Resource
# Rust CLI
ov add-resource https://docs.example.com/api-guide.pdf
# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")
client.add_resource("https://docs.example.com/api-guide.pdf")
Step 5: Search and Retrieve
# Semantic search
ov find "authentication methods"
# List contents
ov ls viking://resources/
# View tree
ov tree viking://resources/docs -L 2
# Grep for content
ov grep "OAuth" --uri viking://resources/docs/
Step 6: Enable VikingBot (Optional)
pip install "openviking[bot]"
# Start server with bot enabled
openviking-server --with-bot
# Start chat in another terminal
ov chat
Performance Benchmarks
OpenViking was benchmarked vs. traditional RAG (LanceDB) and native memory systems on the LoCoMo10 dataset (1,540 long-range dialogue cases).
Task Completion Rates
| System | Completion Rate | Input Tokens |
|---|---|---|
| OpenClaw (native memory) | 35.65% | 24.6M |
| OpenClaw + LanceDB | 44.55% | 51.6M |
| OpenClaw + OpenViking | 52.08% | 4.3M |
- 43% improvement over native memory, 91% token reduction
- 17% improvement over LanceDB, 92% token reduction
- Hierarchical retrieval increases relevance and reduces cost
Integrating OpenViking with Apidog
Apidog users can leverage OpenViking to maintain context, store API docs, and remember preferences.
Step 1: Set Up OpenViking Server
Deploy OpenViking as outlined above with your preferred model providers.
Step 2: Import Apidog API Documentation
ov add-resource https://docs.apidog.com/overview?utm_source=dev.to&utm_medium=wanda&utm_content=n8n-post-automation
ov add-resource https://docs.apidog.com/api-testing?utm_source=dev.to&utm_medium=wanda&utm_content=n8n-post-automation
Imports Apidog docs into viking://resources/ with L0/L1/L2 processing.
Step 3: Store User Preferences
from openviking import OpenViking
client = OpenViking(path="./apidog-agent-data")
session = client.session()
# Record user's default environment
await session.add_message("user", [{
"type": "text",
"text": "Always use the staging environment for API tests"
}])
await session.commit() # Extracts preference memory
Step 4: Query Context During Testing
# Find API endpoints
results = client.find("authentication endpoints")
for ctx in results.resources:
print(f"Found: {ctx.uri}")
# Retrieve user environment preference
prefs = client.find("staging environment preference", target_uri="viking://user/memories/")
Step 5: Connect to Your Agent Framework
OpenViking exposes Python SDK and HTTP API:
# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")
# HTTP API
import httpx
response = httpx.post(
"http://localhost:1933/api/v1/search/find",
json={"query": "authentication endpoints"},
headers={"X-API-Key": "your-api-key"}
)
Advanced Techniques & Best Practices
Pro Tips for Production Deployments
1. Pre-warm Frequently Accessed Context
ov add-resource https://docs.example.com --wait
2. Implement Context Expiration
await session.archive(max_age_days=7)
3. Monitor Vector Index Health
ov debug stats
Common Mistakes to Avoid
- Loading L2 content prematurely—start with L0/L1.
- Skipping session commits—memory extraction only happens on commit.
- Overloading directories—split large resources into subdirectories.
- Ignoring retrieval traces—use traces to debug results.
Performance Optimization
| Scenario | Recommendation |
|---|---|
| High query volume | Run as HTTP server, use connection pooling |
| Large documents | Split into topic-based chunks before import |
| Low latency needs | Pre-generate L0/L1 for hot content |
| Multi-tenant setup | Separate workspaces per tenant |
Security Best Practices
- Store API keys in environment variables or secret managers.
- Enable HTTPS for all HTTP deployments.
- Implement rate limiting on public endpoints.
- Use separate API keys for dev and prod.
Real-World Use Cases
1. AI Coding Assistants
- Navigates project structure via
viking://resources/my_project/src/ - Remembers user coding preferences
- Retrieves API docs during code generation
Result: 67% reduction in forgetful behaviors, 43% token cost savings.
2. Customer Support Agents
- Product documentation in
viking://resources/product/ - Conversation history in
viking://user/memories/past_issues/ - Support playbooks as skills
Result: First-contact resolution up from 52% to 71%.
3. Research Assistants
- Papers categorized by topic
- Research methods stored as skills
- Key findings extracted into memory
Result: Finding relevant papers 3x faster.
Alternatives & Comparisons
OpenViking vs. Traditional Vector Databases
| Aspect | Traditional RAG (Pinecone, LanceDB) | OpenViking |
|---|---|---|
| Storage Model | Flat vector chunks | Hierarchical filesystem |
| Retrieval | Top-K similarity | Directory recursive + intent |
| Observability | Black box | Visualized search traces |
| Token Efficiency | Load all or truncate | L0/L1/L2 progressive loading |
| Memory Iteration | Manual or none | Automatic session management |
| Context Types | Documents only | Resources, memories, skills |
| Debugging | Guesswork | Directory traversal logs |
OpenViking vs. LangChain Memory
| Aspect | LangChain Memory | OpenViking |
|---|---|---|
| Persistence | Conversation buffer only | Full filesystem, L0/L1/L2 |
| Scalability | Limited by context window | Hierarchical loading, no cap |
| Retrieval | Linear search | Directory recursive + semantic |
| Memory Types | Single buffer | 6 categories |
When to Consider Alternatives
Use traditional vector DBs if:
- You need sub-100ms retrieval latency
- Use case is simple keyword search
- Existing RAG pipeline works fine
Use OpenViking if:
- Building long-running agent conversations
- Need multi-type context (docs + preferences + tools)
- Token cost optimization matters
- Require observable, debuggable retrieval
Comparison with Traditional RAG
| Aspect | Traditional RAG | OpenViking |
|---|---|---|
| Storage Model | Flat vector chunks | Hierarchical filesystem |
| Retrieval | Top-K similarity | Directory recursive + intent |
| Observability | Black box | Visualized search traces |
| Token Efficiency | Load all or truncate | L0/L1/L2 progressive loading |
| Memory Iteration | Manual or none | Automatic session management |
| Context Types | Documents only | Resources, memories, skills |
| Debugging | Guesswork | Directory traversal logs |
Production Deployment
For production, run OpenViking as a standalone HTTP service.
Recommended Infrastructure
- Cloud: Volcengine ECS (or similar)
- OS: veLinux or Ubuntu 22.04+
- Storage: SSD-backed AGFS volume
- Network: Low-latency to model APIs
Security Considerations
- Store API keys in env vars or secret manager
- Enable authentication for HTTP endpoints
- Use HTTPS for all communication
- Implement rate limiting
Monitoring
Configure logging:
{
"log": {
"level": "INFO",
"output": "file",
"path": "/var/log/openviking/server.log"
}
}
Monitor:
- Semantic processing queue depth
- Vector search latency
- AGFS read/write operations
- Memory extraction success rates
Limitations and Considerations
Current Limitations
- Python-centric: Primary SDK is Python; others use HTTP API.
- Model dependencies: Requires external VLM and embedding models.
- Learning curve: Filesystem paradigm differs from traditional DBs.
- Early stage: Project active development; APIs may change.
When to Use OpenViking
Good fit:
- Long-running agent conversations
- Multi-type context needs
- Need observable, debuggable retrieval
- Token cost matters
Consider alternatives:
- Simple Q&A apps
- No pain points in current RAG setup
- Need sub-100ms retrieval latency
The Road Ahead
OpenViking is early-stage (v0.1.x, early 2025). Planned roadmap:
- Multi-tenant support
- Advanced analytics and dashboards
- Plugin ecosystem for agent frameworks
- Edge deployment (local-first)
- Enhanced MCP protocol integration
The team is seeking community contributors—project is open source under Apache 2.0.
Conclusion
OpenViking redefines AI agent context management. By organizing information as a filesystem, it eliminates fragmentation, token waste, and black-box retrieval common in traditional RAG.
Key Takeaways
-
Filesystem paradigm unifies context: All memories, resources, skills under
viking://URIs. - L0/L1/L2 loading cuts tokens 91%: Progressive loading, not dumping everything into prompts.
- Directory recursive retrieval boosts accuracy: Focus on high-score directories, then drill down.
- Visualized traces enable debugging: See exactly which retrieval paths were taken.
- Automatic session management enables learning: Agents extract and update memories continuously.



Top comments (0)