In the era of AI-assisted development, how can we help AI assistants better understand our learning resources? The HagiCode project implements a unified, AI-comprehensible knowledge storage abstraction layer through the Vault system, significantly improving learning efficiency when studying projects.
Background
In the AI era, the way developers learn new technologies and architectures is undergoing profound changes. "Learning from projects"—deeply studying and learning from excellent open-source projects' code, architecture, and design patterns—has become an efficient learning method. Compared to traditional reading books or watching videos, directly reading and running high-quality open-source projects helps you understand real-world engineering practices faster.
However, this learning approach also faces several challenges.
Learning materials are too scattered. Your notes might be in Obsidian, code repositories scattered across various folders, and AI assistant conversation history is yet another isolated data silo. When you want AI to help you analyze a project, you have to manually copy code snippets and organize context—a rather tedious process.
What's even more troublesome is context fragmentation. AI assistants cannot directly access your local learning resources, so you need to provide background information anew for each conversation. Plus, the code repositories you're studying update quickly, manual synchronization is error-prone, and it's difficult to share knowledge across multiple learning projects.
These problems are fundamentally caused by "data silos." If there were a unified storage abstraction layer that AI assistants could understand and access all your learning resources, these problems would be solved.
About HagiCode
The Vault system shared in this article is a solution developed during our work on HagiCode. HagiCode is an AI code assistant project. In our daily development, we often need to learn from and reference various open-source projects. To help AI assistants better understand these learning resources, we designed the Vault cross-project persistent storage system.
This solution has been validated in practice in HagiCode. If you're facing similar knowledge management challenges, I hope these experiences provide some inspiration. After all, when you've stepped in some pits, it's good to leave something behind for those who come after.
Vault System Design Philosophy
The core idea of the Vault system is simple: create a unified, AI-comprehensible knowledge storage abstraction layer. From an implementation perspective, the system has several key features.
Multi-Type Support
The system supports four vault types, each corresponding to different use cases:
// folder: General folder type
export const DEFAULT_VAULT_TYPE = 'folder';
// coderef: Type specifically for studying code projects
export const CODEREF_VAULT_TYPE = 'coderef';
// obsidian: Integration with Obsidian note-taking software
export const OBSIDIAN_VAULT_TYPE = 'obsidian';
// system-managed: System automatically managed vault
export const SYSTEM_MANAGED_VAULT_TYPE = 'system-managed';
The coderef type is the most commonly used in HagiCode. It's designed specifically for studying code projects, providing a standardized directory structure and AI-readable metadata descriptions.
Persistent Storage Mechanism
The vault registry is stored persistently in JSON format, ensuring configuration remains available after application restarts:
public class VaultRegistryStore : IVaultRegistryStore
{
private readonly string _registryFilePath;
public VaultRegistryStore(IConfiguration configuration, ILogger<VaultRegistryStore> logger)
{
var dataDir = configuration["DataDir"] ?? "./data";
var absoluteDataDir = Path.IsPathRooted(dataDir)
? dataDir
: Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), dataDir));
_registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json");
}
}
The benefit of this design is simplicity and reliability. JSON format is human-readable, facilitating debugging and manual modification; file system storage avoids database complexity, reducing system dependencies. After all, sometimes simple is best.
AI Context Integration
Most importantly, the system can automatically inject vault information into AI proposal context:
export function buildTargetVaultsText(
vaults: VaultForText[],
template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read');
const editableVaults = vaults.filter((vault) => vault.accessType === 'write');
if (readOnlyVaults.length === 0 && editableVaults.length === 0) {
return '';
}
const sections = [
buildVaultSection(readOnlyVaults, template.reference),
buildVaultSection(editableVaults, template.editable),
].filter(Boolean);
return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}
This achieves an important feature: AI assistants can automatically understand available learning resources without users manually providing context. It's a form of tacit understanding, I suppose.
CodeRef Vault Standardized Structure
For coderef-type vaults, HagiCode provides a standardized directory structure:
my-coderef-vault/
├── index.yaml # Vault metadata description
├── AGENTS.md # AI assistant operation guide
├── docs/ # Store learning notes and documentation
└── repos/ # Manage studied code repositories via Git submodules
When creating a vault, the system automatically initializes this structure:
private async Task EnsureCodeRefStructureAsync(
string vaultName,
string physicalPath,
ICollection<VaultBootstrapDiagnosticDto> diagnostics,
CancellationToken cancellationToken)
{
Directory.CreateDirectory(physicalPath);
var indexPath = Path.Combine(physicalPath, CodeRefIndexFileName);
var docsPath = Path.Combine(physicalPath, CodeRefDocsDirectoryName);
var reposPath = Path.Combine(physicalPath, CodeRefReposDirectoryName);
// Create standard directory structure
if (!Directory.Exists(docsPath))
{
Directory.CreateDirectory(docsPath);
}
if (!Directory.Exists(reposPath))
{
Directory.CreateDirectory(reposPath);
}
// Create AGENTS.md guide
await EnsureCodeRefAgentsDocumentAsync(physicalPath, cancellationToken);
// Create index.yaml metadata
await WriteCodeRefIndexDocumentAsync(indexPath, mergedDocument, cancellationToken);
}
This structure design is intentional:
- docs/ directory stores your learning notes, which can record code understanding, architecture analysis, pitfalls, and experiences in Markdown format
- repos/ directory manages studied repositories via Git submodules rather than directly copying code. This keeps code synchronized while saving space
- index.yaml contains vault metadata, letting AI assistants quickly understand the vault's purpose and content
- AGENTS.md is a guide written specifically for AI assistants, explaining how to handle content in this vault
Organized this way, perhaps AI can more easily understand your ideas.
System-Managed Auto-Initialization
In addition to manually creating vaults, HagiCode also supports system automatically managed vaults:
public async Task<IReadOnlyList<VaultRegistryEntry>> EnsureAllSystemManagedVaultsAsync(
CancellationToken cancellationToken = default)
{
var definitions = GetAllResolvedDefinitions();
var entries = new List<VaultRegistryEntry>(definitions.Count);
foreach (var definition in definitions)
{
entries.Add(await EnsureResolvedSystemManagedVaultAsync(definition, cancellationToken));
}
return entries;
}
The system automatically creates and manages the following vaults:
- hagiprojectdata: Project data storage for saving project configuration and state
- personaldata: Personal data storage for saving user preferences
- hbsprompt: Prompt template library for managing commonly used AI prompts
These vaults are automatically initialized at system startup without manual user configuration. After all, some things are best left to the system—why should humans worry about them?
Access Control Mechanism
An important design is access control. The system divides vaults into two access types:
export interface VaultForText {
id: string;
name: string;
type: string;
physicalPath: string;
accessType: 'read' | 'write'; // Key: distinguishes read-only from editable
}
- reference (read-only): AI only uses for analysis and understanding, cannot modify content. Suitable for reference open-source projects, documentation, etc.
- editable (can edit): AI can modify content as needed for tasks. Suitable for your notes, drafts, etc.
This distinction is important. It lets AI know which content is "read-only reference" and what "can be modified," avoiding the risk of accidental operations. After all, no one wants their hard work inadvertently erased.
Practice: Creating and Using Vaults
Having covered the principles, let's look at practical usage.
Creating a CodeRef Vault
Here's a complete frontend call example:
const createCodeRefVault = async () => {
const response = await VaultService.postApiVaults({
requestBody: {
name: "React Learning Vault",
type: "coderef",
physicalPath: "/Users/developer/vaults/react-learning",
gitUrl: "https://github.com/facebook/react.git"
}
});
// The system will automatically:
// 1. Clone React repository to vault/repos/react
// 2. Create docs/ directory for notes
// 3. Generate index.yaml metadata
// 4. Create AGENTS.md guide file
return response;
};
This API call completes a series of operations: creating directory structure, initializing Git submodules, generating metadata files, and more. You only need to provide basic information, and the system handles the rest. It's actually quite worry-free.
Using Vaults in AI Proposals
Once a vault is created, you can reference it in AI proposals:
const proposal = composeProposalChiefComplaint({
chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
repositories: [
{ id: "react", gitUrl: "https://github.com/facebook/react.git" }
],
vaults: [
{
id: "react-learning",
name: "React Learning Vault",
type: "coderef",
physicalPath: "/vaults/react-learning",
accessType: "read" // AI can only read, not modify
}
],
quickRequestText: "Focus on fiber architecture and scheduler implementation"
});
The system automatically injects vault information into the AI's context, letting AI know what learning resources you have available. AI understanding your ideas is, I suppose, a form of rare tacit understanding.
Best Practices and Considerations
In the process of using the Vault system, we've summarized some experiences and lessons.
Path Safety
The system strictly validates paths to prevent path traversal attacks:
private static string ResolveFilePath(string vaultRoot, string relativePath)
{
var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot));
var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath));
if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase))
{
throw new BusinessException(VaultRelativePathTraversalCode,
"Vault file paths must stay inside the registered vault root.");
}
return combinedPath;
}
This is important. If you're customizing vault paths, ensure paths are within allowed ranges, or the system will refuse the operation. When it comes to security, you can't overemphasize it.
Git Submodule Management
CodeRef vaults recommend using Git submodules rather than directly copying code:
private static string BuildCodeRefAgentsContent()
{
return """
# CodeRef Vault Guide
Repositories under `repos/` should be maintained through Git submodules
rather than copied directly into the vault root.
Keep this structure stable so assistants and tools can understand the vault quickly.
""" + Environment.NewLine;
}
This approach has several benefits: keeping code synchronized with upstream, saving disk space, and facilitating management of multiple code versions. After all, who wants to repeatedly download the same things?
File Preview Limitations
To prevent performance issues, the system limits file size and type:
private const int FileEnumerationLimit = 500;
private const int PreviewByteLimit = 256 * 1024; // 256KB
If your vault contains large numbers of files or very large files, preview function performance may be affected. In such cases, consider batch processing or using specialized search tools. After all, some things are too big and become difficult to handle.
Diagnostic Information
When creating a vault, diagnostic information is returned to help with debugging:
List<VaultBootstrapDiagnosticDto> bootstrapDiagnostics = [];
if (IsCodeRefVaultType(normalizedType))
{
bootstrapDiagnostics = await EnsureCodeRefBootstrapAsync(
normalizedName,
normalizedPhysicalPath,
normalizedGitUrl,
cancellationToken);
}
If creation fails, you can check diagnostic information to understand the specific cause. When something goes wrong, check the diagnostics—that's also a way to solve problems.
Summary
The Vault system addresses core pain points of learning projects in the AI era through a unified storage abstraction layer:
- Centralized Knowledge Management: All learning resources are centralized in one place, no longer scattered
- Automatic AI Context Injection: AI assistants can automatically understand available learning resources without manually providing context
- Cross-Project Knowledge Reuse: Multiple learning projects can share and reuse knowledge
- Standardized Directory Structure: Provides consistent directory structure, reducing learning costs
This solution has been validated in practice in the HagiCode project. If you're also working on AI-assisted development tools or facing similar knowledge management challenges, I hope these experiences provide some reference.
Actually, the value of a technical solution lies not in how complex it is, but in whether it can solve real problems. The Vault system's core idea is simple—establishing a unified, AI-comprehensible knowledge storage layer. But it's precisely this simple abstraction that has significantly improved our development efficiency.
Sometimes, simple is best. After all, complex things often hide more pitfalls...
References
- HagiCode Project: github.com/HagiCode-org/site
- HagiCode Official Site: hagicode.com
- HagiCode Installation Documentation: docs.hagicode.com/installation/docker-compose
- Obsidian Official Site: obsidian.md
- Git Submodules Documentation: git-scm.com/docs/gitsubmodules
If this article helps you, feel free to give a Star on GitHub, or visit the official site to learn more about HagiCode. Public beta has begun—install now to experience complete AI code assistant functionality.
Perhaps, you can also give it a try...
Original Article & License
Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.
- Author: newbe36524
- Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-04-06-vault-persistent-storage-for-ai-era%2F
- License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.
Top comments (0)