DEV Community

Cover image for What is OpenViking ?
Wanda
Wanda

Posted on • Originally published at apidog.com

What is OpenViking ?

TL;DR

OpenViking is an open-source context database for AI agents that replaces flat vector storage with a filesystem paradigm. It organizes context (memories, resources, skills) under viking:// URIs with three layers: L0 (~100 tokens), L1 (~2k tokens), L2 (full content). Benchmarks show 91% token cost reduction and 43% better task completion versus traditional RAG.

Try Apidog today

Introduction

Your AI agent keeps forgetting things. It asks for the same API endpoint twice, ignores your staging environment preference, and loses track of past test results.

This is common in agent development. Most teams cobble together RAG pipelines, vector databases, and custom memory systems—leading to fragmented context, high token costs, and unreliable retrieval.

Benchmarks using the LoCoMo10 dataset show traditional RAG systems achieve only 35-44% task completion rates while consuming 24-51 million input tokens.

OpenViking takes a different path. Built by ByteDance’s OpenViking team, it uses a filesystem approach where all context is organized under viking:// URIs with hierarchical L0/L1/L2 loading. Result: 52% task completion with 91% fewer tokens.

💡 Apidog users building API testing agents can integrate OpenViking to maintain conversation context across test runs, remember user environment preferences, and store API documentation for semantic retrieval.

In this guide, you'll learn how OpenViking addresses context fragmentation, see the L0/L1/L2 model in action, and deploy your first server in under 15 minutes.

The Agent Context Problem

AI agents face unique context management challenges:

For an API testing assistant, it must track:

  • User preferences (“staging environment”, “curl over Python”)
  • Project context (endpoints, auth methods, past test results)
  • Tool patterns (which endpoints fail, common schema errors)
  • Task history (what was tested, which bugs surfaced)

Traditional RAG stores all this as flat chunks in a vector database. You query it and get top-K similar fragments, lacking structure and hierarchy, with no insight into what was missed.

Five Core Challenges

OpenViking targets five core context management issues:

Challenge Traditional RAG OpenViking Solution
Fragmented Context Memories, resources, skills stored separately Unified filesystem paradigm under viking://
Surging Demand Long tasks generate massive context L0/L1/L2 hierarchical loading reduces tokens 91%
Poor Retrieval Flat vector search lacks global view Directory recursive retrieval with intent analysis
Unobservable Black box retrieval chains Visualized search trajectories for debugging
Limited Iteration Only user interaction history Automatic session management with 6 memory categories

OpenViking shifts from “store everything, retrieve vaguely” to “structure everything, retrieve precisely.”

What Is OpenViking?

OpenViking is an open-source context database for AI agents, licensed under Apache 2.0.

OpenViking System Overview

It unifies all context into a virtual filesystem. Memories, resources, and skills are mapped to directories under viking://, each with a URI.

viking://
├── resources/              # External knowledge: docs, code, web pages
│   ├── my_project/
│   │   ├── docs/
│   │   │   ├── api/
│   │   │   └── tutorials/
│   │   └── src/
│   └── ...
├── user/                   # User-specific: preferences, habits
│   └── memories/
│       ├── preferences/
│       │   ├── writing_style
│       │   └── coding_habits
│       └── ...
└── agent/                  # Agent capabilities: skills, task memories
    ├── skills/
    │   ├── search_code
    │   ├── analyze_data
    │   └── ...
    ├── memories/
    └── instructions/
Enter fullscreen mode Exit fullscreen mode

Agents can:

  • List directories: ls viking://resources/my_project/docs/
  • Semantic search: find "authentication methods"
  • Read content: read viking://resources/docs/auth.md
  • Get summaries: abstract viking://resources/docs/

Core Feature 1: Filesystem Management Paradigm

OpenViking solves context fragmentation by unifying all context types under one model.

Three Context Types

Type Purpose Lifecycle Initiative
Resource External knowledge (docs, code, FAQs) Long-term, static User adds
Memory Agent’s cognition (preferences, experiences) Long-term, dynamic Agent extracts
Skill Callable capabilities (tools, MCP) Long-term, static Agent invokes

Each type lives in its directory:

  • viking://resources/: Product manuals, code repositories, docs
  • viking://user/memories/: Preferences, entity memories, events
  • viking://agent/skills/: Tool definitions, MCP configs
  • viking://agent/memories/: Learned patterns, case studies

Unix-like API

OpenViking provides familiar command-line operations:

from openviking import OpenViking

client = OpenViking(path="./data")

# Semantic search
results = client.find("user authentication")

# List directory contents
contents = client.ls("viking://resources/")

# Read full content
doc = client.read("viking://resources/docs/auth.md")

# Get L0 summary
abstract = client.abstract("viking://resources/docs/")

# Get L1 overview
overview = client.overview("viking://resources/docs/")
Enter fullscreen mode Exit fullscreen mode

The API works via Python SDK or HTTP server, compatible with any agent framework.

Core Feature 2: L0/L1/L2 Hierarchical Context Loading

Stuffing all context into prompts is inefficient. OpenViking processes context into three layers:

Layer Name File Token Limit Purpose
L0 Abstract .abstract.md ~100 tokens Vector search, quick filtering
L1 Overview .overview.md ~2k tokens Rerank, content navigation
L2 Detail Original files Unlimited Full content, on-demand loading

How It Works

When adding a resource (e.g., PDF):

  1. Parse document into text
  2. Build directory tree in AGFS storage
  3. Queue semantic processing
  4. Generate L0 abstracts and L1 overviews bottom-up

Example structure:

viking://resources/my_project/
├── .abstract.md          # L0
├── .overview.md          # L1
├── docs/
│   ├── .abstract.md
│   ├── .overview.md
│   ├── auth.md           # L2
│   ├── endpoints.md
│   └── rate-limits.md
└── src/
    └── ...
Enter fullscreen mode Exit fullscreen mode

Token Budget Impact

This approach saves tokens:

# Traditional RAG: load all (50k tokens)
full_docs = retrieve_all("authentication")

# OpenViking: L1 (2k tokens), L2 only if needed
overview = client.overview("viking://resources/docs/auth/")
if needs_more_detail(overview):
    content = client.read("viking://resources/docs/auth/oauth.md")
Enter fullscreen mode Exit fullscreen mode

Benchmarks: 91% lower input token cost, 43% better task completion.

Core Feature 3: Directory Recursive Retrieval

Single vector search struggles with complex queries. OpenViking uses directory recursive retrieval:

Five-Step Process

1. Intent Analysis
2. Initial Positioning (high-score directories)
3. Refined Exploration (within directories)
4. Recursive Descent (subdirectories)
5. Result Aggregation (ranked contexts)
Enter fullscreen mode Exit fullscreen mode
  • Intent Analysis: Finds query type, key entities, expected content.
  • Initial Positioning: Vector search locates high-score directories.
  • Refined Exploration: Searches within top directories for files.
  • Recursive Descent: Repeats process in subdirectories.
  • Result Aggregation: Aggregates and ranks results, preserves retrieval traces.

This approach increases accuracy by leveraging context hierarchy.

Core Feature 4: Visualized Retrieval Traces

Traditional RAG is a black box. OpenViking provides observable retrieval traces:

Retrieval Trace for query: "OAuth token refresh"

├── viking://resources/docs/
│   ├── [SCORE: 0.45] .abstract.md: skipped
│   └── [SCORE: 0.89] auth/: selected
│       ├── [SCORE: 0.92] oauth.md: RETURNED
│       ├── [SCORE: 0.34] jwt.md: skipped
│       └── [SCORE: 0.78] providers/
│           └── [SCORE: 0.85] google.md: RETURNED
Enter fullscreen mode Exit fullscreen mode

This enables debugging by showing which directories/files were visited and why.

Core Feature 5: Automatic Session Management

OpenViking includes a memory self-iteration loop. At session end, it extracts and updates agent knowledge automatically.

Six Memory Categories

Category Owner Location Description Update Strategy
profile user user/memories/.overview.md Basic user info Appendable
preferences user user/memories/preferences/ Preferences by topic Appendable
entities user user/memories/entities/ People, projects Appendable
events user user/memories/events/ Decisions, milestones No update
cases agent agent/memories/cases/ Learned cases No update
patterns agent agent/memories/patterns/ Learned patterns No update

How Memory Extraction Works

session = client.session()

# Add conversation messages
await session.add_message("user", [{"type": "text", "text": "I prefer dark mode in the UI"}])
await session.add_message("assistant", [{"type": "text", "text": "Got it. I'll use dark mode for all future screenshots."}])

# Record tool usage
await session.add_usage({"tool": "screenshot", "parameters": {"theme": "dark"}, "result": "success"})

# Commit triggers memory extraction
await session.commit()
Enter fullscreen mode Exit fullscreen mode

On commit, OpenViking compresses the session, extracts memories via LLM, updates directories, and generates new L0/L1 summaries—enabling agents to learn and adapt.

Architecture Overview

OpenViking separates concerns across multiple layers.

System Architecture

Dual-Layer Storage

Layer Technology Stores
AGFS Custom filesystem L0/L1/L2 content, multimedia, relations
Vector Index Vector DB URIs, embeddings, metadata
  • Content reads come from AGFS.
  • Vector index only stores references, not content.
  • No large text duplication in vector storage.

Quick Start: Deploy Your First OpenViking Server

Prerequisites

  • Python: 3.10+
  • Go: 1.22+ (for AGFS)
  • C++ Compiler: GCC 9+ or Clang 11+
  • OS: Linux, macOS, Windows

Step 1: Install OpenViking

pip install openviking --upgrade --force-reinstall
Enter fullscreen mode Exit fullscreen mode

Optional: Install Rust CLI

curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/crates/ov_cli/install.sh | bash
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure Models

Create ~/.openviking/ov.conf:

{
  "storage": {
    "workspace": "/home/your-name/openviking_workspace"
  },
  "log": {
    "level": "INFO",
    "output": "stdout"
  },
  "embedding": {
    "dense": {
      "api_base": "https://api.openai.com/v1",
      "api_key": "your-openai-api-key",
      "provider": "openai",
      "dimension": 3072,
      "model": "text-embedding-3-large"
    },
    "max_concurrent": 10
  },
  "vlm": {
    "api_base": "https://api.openai.com/v1",
    "api_key": "your-openai-api-key",
    "provider": "openai",
    "model": "gpt-4o",
    "max_concurrent": 100
  }
}
Enter fullscreen mode Exit fullscreen mode
Provider Embedding Models VLM Models
volcengine doubao-embedding-vision doubao-seed-2.0-pro
openai text-embedding-3-large gpt-4o, gpt-4-vision
litellm Via LiteLLM proxy Claude, Gemini, DeepSeek, etc.

LiteLLM enables support for Anthropic, Google, local Ollama, or OpenAI-compatible endpoints.

Step 3: Start the Server

openviking-server
Enter fullscreen mode Exit fullscreen mode

Or background mode:

nohup openviking-server > /data/log/openviking.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Step 4: Add Your First Resource

# Rust CLI
ov add-resource https://docs.example.com/api-guide.pdf

# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")
client.add_resource("https://docs.example.com/api-guide.pdf")
Enter fullscreen mode Exit fullscreen mode

Step 5: Search and Retrieve

# Semantic search
ov find "authentication methods"

# List contents
ov ls viking://resources/

# View tree
ov tree viking://resources/docs -L 2

# Grep for content
ov grep "OAuth" --uri viking://resources/docs/
Enter fullscreen mode Exit fullscreen mode

Step 6: Enable VikingBot (Optional)

pip install "openviking[bot]"

# Start server with bot enabled
openviking-server --with-bot

# Start chat in another terminal
ov chat
Enter fullscreen mode Exit fullscreen mode

Performance Benchmarks

OpenViking was benchmarked vs. traditional RAG (LanceDB) and native memory systems on the LoCoMo10 dataset (1,540 long-range dialogue cases).

Task Completion Rates

System Completion Rate Input Tokens
OpenClaw (native memory) 35.65% 24.6M
OpenClaw + LanceDB 44.55% 51.6M
OpenClaw + OpenViking 52.08% 4.3M
  • 43% improvement over native memory, 91% token reduction
  • 17% improvement over LanceDB, 92% token reduction
  • Hierarchical retrieval increases relevance and reduces cost

Integrating OpenViking with Apidog

Apidog users can leverage OpenViking to maintain context, store API docs, and remember preferences.

Apidog Integration

Step 1: Set Up OpenViking Server

Deploy OpenViking as outlined above with your preferred model providers.

Step 2: Import Apidog API Documentation

ov add-resource https://docs.apidog.com/overview?utm_source=dev.to&utm_medium=wanda&utm_content=n8n-post-automation
ov add-resource https://docs.apidog.com/api-testing?utm_source=dev.to&utm_medium=wanda&utm_content=n8n-post-automation
Enter fullscreen mode Exit fullscreen mode

Imports Apidog docs into viking://resources/ with L0/L1/L2 processing.

Step 3: Store User Preferences

from openviking import OpenViking

client = OpenViking(path="./apidog-agent-data")
session = client.session()

# Record user's default environment
await session.add_message("user", [{
    "type": "text",
    "text": "Always use the staging environment for API tests"
}])
await session.commit()  # Extracts preference memory
Enter fullscreen mode Exit fullscreen mode

Step 4: Query Context During Testing

# Find API endpoints
results = client.find("authentication endpoints")
for ctx in results.resources:
    print(f"Found: {ctx.uri}")

# Retrieve user environment preference
prefs = client.find("staging environment preference", target_uri="viking://user/memories/")
Enter fullscreen mode Exit fullscreen mode

Step 5: Connect to Your Agent Framework

OpenViking exposes Python SDK and HTTP API:

# Python SDK
from openviking import OpenViking
client = OpenViking(path="./data")

# HTTP API
import httpx
response = httpx.post(
    "http://localhost:1933/api/v1/search/find",
    json={"query": "authentication endpoints"},
    headers={"X-API-Key": "your-api-key"}
)
Enter fullscreen mode Exit fullscreen mode

Advanced Techniques & Best Practices

Pro Tips for Production Deployments

1. Pre-warm Frequently Accessed Context

ov add-resource https://docs.example.com --wait
Enter fullscreen mode Exit fullscreen mode

2. Implement Context Expiration

await session.archive(max_age_days=7)
Enter fullscreen mode Exit fullscreen mode

3. Monitor Vector Index Health

ov debug stats
Enter fullscreen mode Exit fullscreen mode

Common Mistakes to Avoid

  1. Loading L2 content prematurely—start with L0/L1.
  2. Skipping session commits—memory extraction only happens on commit.
  3. Overloading directories—split large resources into subdirectories.
  4. Ignoring retrieval traces—use traces to debug results.

Performance Optimization

Scenario Recommendation
High query volume Run as HTTP server, use connection pooling
Large documents Split into topic-based chunks before import
Low latency needs Pre-generate L0/L1 for hot content
Multi-tenant setup Separate workspaces per tenant

Security Best Practices

  • Store API keys in environment variables or secret managers.
  • Enable HTTPS for all HTTP deployments.
  • Implement rate limiting on public endpoints.
  • Use separate API keys for dev and prod.

Real-World Use Cases

1. AI Coding Assistants

  • Navigates project structure via viking://resources/my_project/src/
  • Remembers user coding preferences
  • Retrieves API docs during code generation

Result: 67% reduction in forgetful behaviors, 43% token cost savings.

2. Customer Support Agents

  • Product documentation in viking://resources/product/
  • Conversation history in viking://user/memories/past_issues/
  • Support playbooks as skills

Result: First-contact resolution up from 52% to 71%.

3. Research Assistants

  • Papers categorized by topic
  • Research methods stored as skills
  • Key findings extracted into memory

Result: Finding relevant papers 3x faster.

Alternatives & Comparisons

OpenViking vs. Traditional Vector Databases

Aspect Traditional RAG (Pinecone, LanceDB) OpenViking
Storage Model Flat vector chunks Hierarchical filesystem
Retrieval Top-K similarity Directory recursive + intent
Observability Black box Visualized search traces
Token Efficiency Load all or truncate L0/L1/L2 progressive loading
Memory Iteration Manual or none Automatic session management
Context Types Documents only Resources, memories, skills
Debugging Guesswork Directory traversal logs

OpenViking vs. LangChain Memory

Aspect LangChain Memory OpenViking
Persistence Conversation buffer only Full filesystem, L0/L1/L2
Scalability Limited by context window Hierarchical loading, no cap
Retrieval Linear search Directory recursive + semantic
Memory Types Single buffer 6 categories

When to Consider Alternatives

Use traditional vector DBs if:

  • You need sub-100ms retrieval latency
  • Use case is simple keyword search
  • Existing RAG pipeline works fine

Use OpenViking if:

  • Building long-running agent conversations
  • Need multi-type context (docs + preferences + tools)
  • Token cost optimization matters
  • Require observable, debuggable retrieval

Comparison with Traditional RAG

Aspect Traditional RAG OpenViking
Storage Model Flat vector chunks Hierarchical filesystem
Retrieval Top-K similarity Directory recursive + intent
Observability Black box Visualized search traces
Token Efficiency Load all or truncate L0/L1/L2 progressive loading
Memory Iteration Manual or none Automatic session management
Context Types Documents only Resources, memories, skills
Debugging Guesswork Directory traversal logs

Production Deployment

For production, run OpenViking as a standalone HTTP service.

Recommended Infrastructure

  • Cloud: Volcengine ECS (or similar)
  • OS: veLinux or Ubuntu 22.04+
  • Storage: SSD-backed AGFS volume
  • Network: Low-latency to model APIs

Security Considerations

  • Store API keys in env vars or secret manager
  • Enable authentication for HTTP endpoints
  • Use HTTPS for all communication
  • Implement rate limiting

Monitoring

Configure logging:

{
  "log": {
    "level": "INFO",
    "output": "file",
    "path": "/var/log/openviking/server.log"
  }
}
Enter fullscreen mode Exit fullscreen mode

Monitor:

  • Semantic processing queue depth
  • Vector search latency
  • AGFS read/write operations
  • Memory extraction success rates

Limitations and Considerations

Current Limitations

  • Python-centric: Primary SDK is Python; others use HTTP API.
  • Model dependencies: Requires external VLM and embedding models.
  • Learning curve: Filesystem paradigm differs from traditional DBs.
  • Early stage: Project active development; APIs may change.

When to Use OpenViking

Good fit:

  • Long-running agent conversations
  • Multi-type context needs
  • Need observable, debuggable retrieval
  • Token cost matters

Consider alternatives:

  • Simple Q&A apps
  • No pain points in current RAG setup
  • Need sub-100ms retrieval latency

The Road Ahead

OpenViking is early-stage (v0.1.x, early 2025). Planned roadmap:

  • Multi-tenant support
  • Advanced analytics and dashboards
  • Plugin ecosystem for agent frameworks
  • Edge deployment (local-first)
  • Enhanced MCP protocol integration

The team is seeking community contributors—project is open source under Apache 2.0.

Conclusion

OpenViking redefines AI agent context management. By organizing information as a filesystem, it eliminates fragmentation, token waste, and black-box retrieval common in traditional RAG.

Key Takeaways

  • Filesystem paradigm unifies context: All memories, resources, skills under viking:// URIs.
  • L0/L1/L2 loading cuts tokens 91%: Progressive loading, not dumping everything into prompts.
  • Directory recursive retrieval boosts accuracy: Focus on high-score directories, then drill down.
  • Visualized traces enable debugging: See exactly which retrieval paths were taken.
  • Automatic session management enables learning: Agents extract and update memories continuously.

Top comments (0)