DEV Community

Hagicode
Hagicode

Posted on • Originally published at docs.hagicode.com

Building Cross-Project Knowledge Base for the AI Era with Vault System

Building Cross-Project Knowledge Base for the AI Era with Vault System

Project imitation learning is becoming mainstream, but scattered learning materials and fragmented context prevent AI assistants from delivering maximum value. This article introduces the Vault system design from the HagiCode project—through a unified storage abstraction layer, enabling AI assistants to understand and access all learning resources, achieving true cross-project knowledge reuse.

Background

Actually, in the AI era, our approach to learning new technologies is quietly changing. Traditional reading and video watching remain important, but "project imitation"—deeply studying and learning excellent open-source projects' code, architecture, and design patterns—is indeed becoming increasingly efficient. Directly running and modifying high-quality open-source projects lets you fastest understand real-world engineering practices.

But this approach also brings new challenges.

Learning materials are too scattered. Notes might be in Obsidian, code repositories scattered across various folders, and AI assistant conversation history is yet another isolated data silo. Each time you need AI help analyzing a project, you have to manually copy code snippets and organize context—quite a tedious process.

Context frequently gets lost. AI assistants cannot directly access local learning resources, so background information must be re-provided for each conversation. Imitated code repositories update quickly, and manual synchronization is error-prone. Worse still, knowledge is difficult to share between multiple learning projects—design patterns learned in project A are completely unknown to AI when processing project B.

The essence of these problems is "data silos." If there could be a unified storage abstraction layer enabling AI assistants to understand and access all learning resources, the problem would be solved.

To address these pain points, we made a key design decision while developing HagiCode: build a Vault system as a unified knowledge storage abstraction layer. The impact of this decision may be greater than imagined—more on this shortly.

About HagiCode

The solution shared in this article comes from practical experience in the HagiCode project. HagiCode is an AI code assistant based on the OpenSpec workflow, with its core philosophy being that AI should not only "speak" but also "do"—directly manipulating code repositories, executing commands, and running tests. GitHub: github.com/HagiCode-org/site

During development, we found that AI assistants need frequent access to users' various learning resources: code repositories, note documents, configuration files, etc. If users had to manually provide these each time, the experience would be terrible. This prompted the design of the Vault system.

Core Design

Multi-Type Support

HagiCode's Vault system supports four types, each corresponding to different use cases:

Type Purpose Typical Scenarios
folder General folder type Temporary learning materials, drafts
coderef Specialized for imitating code projects Systematically learning an open-source project
obsidian Integration with Obsidian note-taking software Reusing existing note libraries
system-managed System automatic management Project configuration, prompt templates, etc.

The coderef type is most commonly used in HagiCode, providing standardized directory structure and AI-readable metadata descriptions for imitating code projects. Why design this type specifically? Because imitating an open-source project isn't simply "downloading code"—it requires simultaneously managing the code itself, learning notes, configuration files, and other content. coderef standardizes all of this.

Persistent Storage Mechanism

The Vault registry is persisted to the file system in JSON format:

_registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json");
Enter fullscreen mode Exit fullscreen mode

This design appears simple but is actually well-considered:

Simple and reliable. JSON format is human-readable, facilitating debugging and manual modification. When system issues arise, you can directly open the file to check status, or even manually fix it—particularly useful during development.

Reduced dependencies. File system storage avoids database complexity. No need to additionally install and configure database services, reducing system complexity and maintenance costs.

Concurrency safe. Uses SemaphoreSlim to ensure multi-thread safety. In the AI code assistant scenario, multiple operations may simultaneously access the vault registry, requiring proper concurrency control.

AI Context Integration

The system's core capability lies in automatically injecting vault information into AI proposal context:

export function buildTargetVaultsText(
  vaults: VaultForText[],
  template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
  const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read');
  const editableVaults = vaults.filter((vault) => vault.accessType === 'write');

  const sections = [
    buildVaultSection(readOnlyVaults, template.reference),
    buildVaultSection(editableVaults, template.editable),
  ].filter(Boolean);

  return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}
Enter fullscreen mode Exit fullscreen mode

This way AI assistants can automatically understand available learning resources without users manually providing context each time. This design makes HagiCode's experience particularly natural—tell AI "help me analyze React's concurrent rendering," and AI can automatically find the previously registered React learning vault, instead of repeatedly pasting code.

Access Control Mechanism

The system divides vaults into two access types:

  • reference (read-only): AI only uses for analysis and understanding, cannot modify content
  • editable (can edit): AI can modify content as needed by the task

This distinction lets AI know which content is "read-only reference" and which is "okay to modify," avoiding misoperation risks. For example, if you register an open-source project vault as learning material, you certainly don't want AI casually modifying code inside—mark it as reference. But if it's your own project vault, you can mark it as editable to let AI help modify code.

Practice Guide

CodeRef Vault's Standardized Structure

For coderef-type vaults, the system provides a standardized directory structure:

my-coderef-vault/
├── index.yaml          # vault metadata description
├── AGENTS.md           # AI assistant operation guide
├── docs/               # store learning notes and documents
└── repos/              # manage imitated code repositories via Git submodules
Enter fullscreen mode Exit fullscreen mode

What's the design philosophy of this structure?

docs/ stores learning notes, recording code understanding, architecture analysis, and pitfall experiences in Markdown format. These notes aren't just for yourself—AI can also read them, automatically referencing them when handling related tasks.

repos/ manages imitated repositories via Git submodules rather than directly copying code. This has two benefits: first, maintaining sync with upstream—one git submodule update gets the latest code; second, saving space—multiple vaults can reference different versions of the same repository.

index.yaml contains vault metadata, letting AI assistants quickly understand purpose and content. It's like writing a "self-introduction" for the vault, so AI knows what it's for when first encountering it.

AGENTS.md is a guide specifically written for AI assistants, explaining how to handle content in the vault. You can tell AI here: "when analyzing this project, focus on performance optimization related code" or "don't modify test files."

Creating and Using Vault

Creating a CodeRef vault is simple:

const createCodeRefVault = async () => {
  const response = await VaultService.postApiVaults({
    requestBody: {
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/Users/developer/vaults/react-learning",
      gitUrl: "https://github.com/facebook/react.git"
    }
  });

  // System will automatically:
  // 1. Clone React repository to vault/repos/react
  // 2. Create docs/ directory for notes
  // 3. Generate index.yaml metadata
  // 4. Create AGENTS.md guide file

  return response;
};
Enter fullscreen mode Exit fullscreen mode

Then reference this vault in an AI proposal:

const proposal = composeProposalChiefComplaint({
  chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
  repositories: [
    { id: "react", gitUrl: "https://github.com/facebook/react.git" }
  ],
  vaults: [
    {
      id: "react-learning",
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/vaults/react-learning",
      accessType: "read"  // AI can only read, not modify
    }
  ],
  quickRequestText: "Focus on fiber architecture and scheduler implementation"
});
Enter fullscreen mode Exit fullscreen mode

Typical Use Scenarios

Scenario 1: Systematically learning open-source projects

Create a CodeRef vault, manage target repositories via Git submodules, and record learning notes in the docs/ directory. AI can simultaneously access code and notes, providing more precise analysis. Notes written while learning a module are automatically referenced by AI in subsequent related code analysis—like having an "assistant" remember previous thinking.

Scenario 2: Reusing Obsidian note libraries

If already using Obsidian for note management, simply register existing vaults to HagiCode. AI can directly access the knowledge base without manual copy-paste. This feature is particularly practical—many people have accumulated note libraries over years, and after integration AI can "read" and understand the knowledge system.

Scenario 3: Cross-project knowledge reuse

Multiple AI proposals can reference the same vault, achieving cross-project knowledge reuse. For example, create a "design patterns learning vault" containing notes and code examples of various design patterns. Regardless of which project is being analyzed, AI can reference content from this vault—knowledge doesn't need to be accumulated repeatedly.

Path Security Mechanism

The system strictly validates paths to prevent path traversal attacks:

private static string ResolveFilePath(string vaultRoot, string relativePath)
{
    var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot));
    var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath));
    if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase))
    {
        throw new BusinessException(VaultRelativePathTraversalCode,
            "Vault file paths must stay inside the registered vault root.");
    }
    return combinedPath;
}
Enter fullscreen mode Exit fullscreen mode

This ensures all file operations stay within the vault's root directory scope, preventing malicious path access. Security can't be careless—when AI assistants operate on the file system, boundaries must be clearly defined.

Considerations

When using the HagiCode Vault system, several points need special attention:

  1. Path safety: Ensure custom paths are within allowed ranges, otherwise the system will refuse operations. This prevents misoperations and potential security risks.

  2. Git submodule management: CodeRef vaults recommend using Git submodules rather than directly copying code. Benefits mentioned earlier—maintaining sync, saving space. However, submodules have their own usage patterns, and first-time users may need some familiarization.

  3. File preview limitations: The system limits file size (256KB) and count (500 files); oversized files need batch processing. This limitation is for performance considerations—if encountering oversized files, manually split or use other processing methods.

  4. Diagnostic information: Creating vaults returns diagnostic information usable for debugging on failure. When encountering issues, check diagnostic information first—most cases provide clues.

Summary

HagiCode's Vault system essentially addresses a simple but profound problem: how to enable AI assistants to understand and use local knowledge resources.

Through unified storage abstraction layer, standardized directory structure, and automated context injection, it achieves a "register once, reuse everywhere" knowledge management approach. After creating a vault, whether learning notes, code repositories, or documentation materials, AI can automatically access and understand them.

The experience improvement from this design is obvious. No more manually copying code snippets or repeatedly explaining background information—AI assistants are like colleagues who truly understand project context, able to provide more valuable help based on existing knowledge.

The Vault system shared in this article is a solution actually developed and optimized through real pitfalls during HagiCode development. If you find this design valuable, it indicates good engineering capability—then HagiCode itself is also worth attention.

References

If this article helps you:

Public beta has begun, welcome to install and experience.

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Top comments (0)