Building Cross-Project Knowledge Base for the AI Era with Vault System
Project imitation learning is becoming mainstream, but scattered learning materials and fragmented context prevent AI assistants from delivering maximum value. This article introduces the Vault system design from the HagiCode project—through a unified storage abstraction layer, enabling AI assistants to understand and access all learning resources, achieving true cross-project knowledge reuse.
Background
Actually, in the AI era, our approach to learning new technologies is quietly changing. Traditional reading and video watching remain important, but "project imitation"—deeply studying and learning excellent open-source projects' code, architecture, and design patterns—is indeed becoming increasingly efficient. Directly running and modifying high-quality open-source projects lets you fastest understand real-world engineering practices.
But this approach also brings new challenges.
Learning materials are too scattered. Notes might be in Obsidian, code repositories scattered across various folders, and AI assistant conversation history is yet another isolated data silo. Each time you need AI help analyzing a project, you have to manually copy code snippets and organize context—quite a tedious process.
Context frequently gets lost. AI assistants cannot directly access local learning resources, so background information must be re-provided for each conversation. Imitated code repositories update quickly, and manual synchronization is error-prone. Worse still, knowledge is difficult to share between multiple learning projects—design patterns learned in project A are completely unknown to AI when processing project B.
The essence of these problems is "data silos." If there could be a unified storage abstraction layer enabling AI assistants to understand and access all learning resources, the problem would be solved.
To address these pain points, we made a key design decision while developing HagiCode: build a Vault system as a unified knowledge storage abstraction layer. The impact of this decision may be greater than imagined—more on this shortly.
About HagiCode
The solution shared in this article comes from practical experience in the HagiCode project. HagiCode is an AI code assistant based on the OpenSpec workflow, with its core philosophy being that AI should not only "speak" but also "do"—directly manipulating code repositories, executing commands, and running tests. GitHub: github.com/HagiCode-org/site
During development, we found that AI assistants need frequent access to users' various learning resources: code repositories, note documents, configuration files, etc. If users had to manually provide these each time, the experience would be terrible. This prompted the design of the Vault system.
Core Design
Multi-Type Support
HagiCode's Vault system supports four types, each corresponding to different use cases:
| Type | Purpose | Typical Scenarios |
|---|---|---|
folder |
General folder type | Temporary learning materials, drafts |
coderef |
Specialized for imitating code projects | Systematically learning an open-source project |
obsidian |
Integration with Obsidian note-taking software | Reusing existing note libraries |
system-managed |
System automatic management | Project configuration, prompt templates, etc. |
The coderef type is most commonly used in HagiCode, providing standardized directory structure and AI-readable metadata descriptions for imitating code projects. Why design this type specifically? Because imitating an open-source project isn't simply "downloading code"—it requires simultaneously managing the code itself, learning notes, configuration files, and other content. coderef standardizes all of this.
Persistent Storage Mechanism
The Vault registry is persisted to the file system in JSON format:
_registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json");
This design appears simple but is actually well-considered:
Simple and reliable. JSON format is human-readable, facilitating debugging and manual modification. When system issues arise, you can directly open the file to check status, or even manually fix it—particularly useful during development.
Reduced dependencies. File system storage avoids database complexity. No need to additionally install and configure database services, reducing system complexity and maintenance costs.
Concurrency safe. Uses SemaphoreSlim to ensure multi-thread safety. In the AI code assistant scenario, multiple operations may simultaneously access the vault registry, requiring proper concurrency control.
AI Context Integration
The system's core capability lies in automatically injecting vault information into AI proposal context:
export function buildTargetVaultsText(
vaults: VaultForText[],
template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read');
const editableVaults = vaults.filter((vault) => vault.accessType === 'write');
const sections = [
buildVaultSection(readOnlyVaults, template.reference),
buildVaultSection(editableVaults, template.editable),
].filter(Boolean);
return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}
This way AI assistants can automatically understand available learning resources without users manually providing context each time. This design makes HagiCode's experience particularly natural—tell AI "help me analyze React's concurrent rendering," and AI can automatically find the previously registered React learning vault, instead of repeatedly pasting code.
Access Control Mechanism
The system divides vaults into two access types:
- reference (read-only): AI only uses for analysis and understanding, cannot modify content
- editable (can edit): AI can modify content as needed by the task
This distinction lets AI know which content is "read-only reference" and which is "okay to modify," avoiding misoperation risks. For example, if you register an open-source project vault as learning material, you certainly don't want AI casually modifying code inside—mark it as reference. But if it's your own project vault, you can mark it as editable to let AI help modify code.
Practice Guide
CodeRef Vault's Standardized Structure
For coderef-type vaults, the system provides a standardized directory structure:
my-coderef-vault/
├── index.yaml # vault metadata description
├── AGENTS.md # AI assistant operation guide
├── docs/ # store learning notes and documents
└── repos/ # manage imitated code repositories via Git submodules
What's the design philosophy of this structure?
docs/ stores learning notes, recording code understanding, architecture analysis, and pitfall experiences in Markdown format. These notes aren't just for yourself—AI can also read them, automatically referencing them when handling related tasks.
repos/ manages imitated repositories via Git submodules rather than directly copying code. This has two benefits: first, maintaining sync with upstream—one git submodule update gets the latest code; second, saving space—multiple vaults can reference different versions of the same repository.
index.yaml contains vault metadata, letting AI assistants quickly understand purpose and content. It's like writing a "self-introduction" for the vault, so AI knows what it's for when first encountering it.
AGENTS.md is a guide specifically written for AI assistants, explaining how to handle content in the vault. You can tell AI here: "when analyzing this project, focus on performance optimization related code" or "don't modify test files."
Creating and Using Vault
Creating a CodeRef vault is simple:
const createCodeRefVault = async () => {
const response = await VaultService.postApiVaults({
requestBody: {
name: "React Learning Vault",
type: "coderef",
physicalPath: "/Users/developer/vaults/react-learning",
gitUrl: "https://github.com/facebook/react.git"
}
});
// System will automatically:
// 1. Clone React repository to vault/repos/react
// 2. Create docs/ directory for notes
// 3. Generate index.yaml metadata
// 4. Create AGENTS.md guide file
return response;
};
Then reference this vault in an AI proposal:
const proposal = composeProposalChiefComplaint({
chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
repositories: [
{ id: "react", gitUrl: "https://github.com/facebook/react.git" }
],
vaults: [
{
id: "react-learning",
name: "React Learning Vault",
type: "coderef",
physicalPath: "/vaults/react-learning",
accessType: "read" // AI can only read, not modify
}
],
quickRequestText: "Focus on fiber architecture and scheduler implementation"
});
Typical Use Scenarios
Scenario 1: Systematically learning open-source projects
Create a CodeRef vault, manage target repositories via Git submodules, and record learning notes in the docs/ directory. AI can simultaneously access code and notes, providing more precise analysis. Notes written while learning a module are automatically referenced by AI in subsequent related code analysis—like having an "assistant" remember previous thinking.
Scenario 2: Reusing Obsidian note libraries
If already using Obsidian for note management, simply register existing vaults to HagiCode. AI can directly access the knowledge base without manual copy-paste. This feature is particularly practical—many people have accumulated note libraries over years, and after integration AI can "read" and understand the knowledge system.
Scenario 3: Cross-project knowledge reuse
Multiple AI proposals can reference the same vault, achieving cross-project knowledge reuse. For example, create a "design patterns learning vault" containing notes and code examples of various design patterns. Regardless of which project is being analyzed, AI can reference content from this vault—knowledge doesn't need to be accumulated repeatedly.
Path Security Mechanism
The system strictly validates paths to prevent path traversal attacks:
private static string ResolveFilePath(string vaultRoot, string relativePath)
{
var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot));
var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath));
if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase))
{
throw new BusinessException(VaultRelativePathTraversalCode,
"Vault file paths must stay inside the registered vault root.");
}
return combinedPath;
}
This ensures all file operations stay within the vault's root directory scope, preventing malicious path access. Security can't be careless—when AI assistants operate on the file system, boundaries must be clearly defined.
Considerations
When using the HagiCode Vault system, several points need special attention:
Path safety: Ensure custom paths are within allowed ranges, otherwise the system will refuse operations. This prevents misoperations and potential security risks.
Git submodule management: CodeRef vaults recommend using Git submodules rather than directly copying code. Benefits mentioned earlier—maintaining sync, saving space. However, submodules have their own usage patterns, and first-time users may need some familiarization.
File preview limitations: The system limits file size (256KB) and count (500 files); oversized files need batch processing. This limitation is for performance considerations—if encountering oversized files, manually split or use other processing methods.
Diagnostic information: Creating vaults returns diagnostic information usable for debugging on failure. When encountering issues, check diagnostic information first—most cases provide clues.
Summary
HagiCode's Vault system essentially addresses a simple but profound problem: how to enable AI assistants to understand and use local knowledge resources.
Through unified storage abstraction layer, standardized directory structure, and automated context injection, it achieves a "register once, reuse everywhere" knowledge management approach. After creating a vault, whether learning notes, code repositories, or documentation materials, AI can automatically access and understand them.
The experience improvement from this design is obvious. No more manually copying code snippets or repeatedly explaining background information—AI assistants are like colleagues who truly understand project context, able to provide more valuable help based on existing knowledge.
The Vault system shared in this article is a solution actually developed and optimized through real pitfalls during HagiCode development. If you find this design valuable, it indicates good engineering capability—then HagiCode itself is also worth attention.
References
- HagiCode GitHub: github.com/HagiCode-org/site
- HagiCode official site: hagicode.com
- 30-minute practical demo: www.bilibili.com/video/BV1pirZBuEzq/
- Docker Compose installation guide: docs.hagicode.com/installation/docker-compose
- Desktop quick installation: hagicode.com/desktop/
If this article helps you:
- Give a Star on GitHub: github.com/HagiCode-org/site
- Visit official site to learn more: hagicode.com
- Watch practical demo video: www.bilibili.com/video/BV1pirZBuEzq/
- One-click installation experience: docs.hagicode.com/installation/docker-compose
- Desktop quick installation: hagicode.com/desktop/
Public beta has begun, welcome to install and experience.
Original Article & License
Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.
- Author: newbe36524
- Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-04-10-vault-system-ai-knowledge-base%2F
- License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.
Top comments (0)