DEV Community

Hagicode
Hagicode

Posted on • Originally published at docs.hagicode.com

How to Learn from Projects in the AI Era: Vault Cross-Project Persistent Storage System

In the era of AI-assisted development, how can we help AI assistants better understand our learning resources? The HagiCode project implements a unified, AI-comprehensible knowledge storage abstraction layer through the Vault system, significantly improving learning efficiency when studying projects.

Background

In the AI era, the way developers learn new technologies and architectures is undergoing profound changes. "Learning from projects"—deeply studying and learning from excellent open-source projects' code, architecture, and design patterns—has become an efficient learning method. Compared to traditional reading books or watching videos, directly reading and running high-quality open-source projects helps you understand real-world engineering practices faster.

However, this learning approach also faces several challenges.

Learning materials are too scattered. Your notes might be in Obsidian, code repositories scattered across various folders, and AI assistant conversation history is yet another isolated data silo. When you want AI to help you analyze a project, you have to manually copy code snippets and organize context—a rather tedious process.

What's even more troublesome is context fragmentation. AI assistants cannot directly access your local learning resources, so you need to provide background information anew for each conversation. Plus, the code repositories you're studying update quickly, manual synchronization is error-prone, and it's difficult to share knowledge across multiple learning projects.

These problems are fundamentally caused by "data silos." If there were a unified storage abstraction layer that AI assistants could understand and access all your learning resources, these problems would be solved.

About HagiCode

The Vault system shared in this article is a solution developed during our work on HagiCode. HagiCode is an AI code assistant project. In our daily development, we often need to learn from and reference various open-source projects. To help AI assistants better understand these learning resources, we designed the Vault cross-project persistent storage system.

This solution has been validated in practice in HagiCode. If you're facing similar knowledge management challenges, I hope these experiences provide some inspiration. After all, when you've stepped in some pits, it's good to leave something behind for those who come after.

Vault System Design Philosophy

The core idea of the Vault system is simple: create a unified, AI-comprehensible knowledge storage abstraction layer. From an implementation perspective, the system has several key features.

Multi-Type Support

The system supports four vault types, each corresponding to different use cases:

// folder: General folder type
export const DEFAULT_VAULT_TYPE = 'folder';

// coderef: Type specifically for studying code projects
export const CODEREF_VAULT_TYPE = 'coderef';

// obsidian: Integration with Obsidian note-taking software
export const OBSIDIAN_VAULT_TYPE = 'obsidian';

// system-managed: System automatically managed vault
export const SYSTEM_MANAGED_VAULT_TYPE = 'system-managed';
Enter fullscreen mode Exit fullscreen mode

The coderef type is the most commonly used in HagiCode. It's designed specifically for studying code projects, providing a standardized directory structure and AI-readable metadata descriptions.

Persistent Storage Mechanism

The vault registry is stored persistently in JSON format, ensuring configuration remains available after application restarts:

public class VaultRegistryStore : IVaultRegistryStore
{
    private readonly string _registryFilePath;

    public VaultRegistryStore(IConfiguration configuration, ILogger<VaultRegistryStore> logger)
    {
        var dataDir = configuration["DataDir"] ?? "./data";
        var absoluteDataDir = Path.IsPathRooted(dataDir)
            ? dataDir
            : Path.GetFullPath(Path.Combine(Directory.GetCurrentDirectory(), dataDir));

        _registryFilePath = Path.Combine(absoluteDataDir, "personal-data", "vaults", "registry.json");
    }
}
Enter fullscreen mode Exit fullscreen mode

The benefit of this design is simplicity and reliability. JSON format is human-readable, facilitating debugging and manual modification; file system storage avoids database complexity, reducing system dependencies. After all, sometimes simple is best.

AI Context Integration

Most importantly, the system can automatically inject vault information into AI proposal context:

export function buildTargetVaultsText(
  vaults: VaultForText[],
  template: VaultPromptTemplate = DEFAULT_VAULT_PROMPT_TEMPLATE,
): string {
  const readOnlyVaults = vaults.filter((vault) => vault.accessType === 'read');
  const editableVaults = vaults.filter((vault) => vault.accessType === 'write');

  if (readOnlyVaults.length === 0 && editableVaults.length === 0) {
    return '';
  }

  const sections = [
    buildVaultSection(readOnlyVaults, template.reference),
    buildVaultSection(editableVaults, template.editable),
  ].filter(Boolean);

  return `\n\n### ${template.heading}\n\n${sections.join('\n')}`;
}
Enter fullscreen mode Exit fullscreen mode

This achieves an important feature: AI assistants can automatically understand available learning resources without users manually providing context. It's a form of tacit understanding, I suppose.

CodeRef Vault Standardized Structure

For coderef-type vaults, HagiCode provides a standardized directory structure:

my-coderef-vault/
├── index.yaml          # Vault metadata description
├── AGENTS.md           # AI assistant operation guide
├── docs/               # Store learning notes and documentation
└── repos/              # Manage studied code repositories via Git submodules
Enter fullscreen mode Exit fullscreen mode

When creating a vault, the system automatically initializes this structure:

private async Task EnsureCodeRefStructureAsync(
    string vaultName,
    string physicalPath,
    ICollection<VaultBootstrapDiagnosticDto> diagnostics,
    CancellationToken cancellationToken)
{
    Directory.CreateDirectory(physicalPath);

    var indexPath = Path.Combine(physicalPath, CodeRefIndexFileName);
    var docsPath = Path.Combine(physicalPath, CodeRefDocsDirectoryName);
    var reposPath = Path.Combine(physicalPath, CodeRefReposDirectoryName);

    // Create standard directory structure
    if (!Directory.Exists(docsPath))
    {
        Directory.CreateDirectory(docsPath);
    }

    if (!Directory.Exists(reposPath))
    {
        Directory.CreateDirectory(reposPath);
    }

    // Create AGENTS.md guide
    await EnsureCodeRefAgentsDocumentAsync(physicalPath, cancellationToken);

    // Create index.yaml metadata
    await WriteCodeRefIndexDocumentAsync(indexPath, mergedDocument, cancellationToken);
}
Enter fullscreen mode Exit fullscreen mode

This structure design is intentional:

  • docs/ directory stores your learning notes, which can record code understanding, architecture analysis, pitfalls, and experiences in Markdown format
  • repos/ directory manages studied repositories via Git submodules rather than directly copying code. This keeps code synchronized while saving space
  • index.yaml contains vault metadata, letting AI assistants quickly understand the vault's purpose and content
  • AGENTS.md is a guide written specifically for AI assistants, explaining how to handle content in this vault

Organized this way, perhaps AI can more easily understand your ideas.

System-Managed Auto-Initialization

In addition to manually creating vaults, HagiCode also supports system automatically managed vaults:

public async Task<IReadOnlyList<VaultRegistryEntry>> EnsureAllSystemManagedVaultsAsync(
    CancellationToken cancellationToken = default)
{
    var definitions = GetAllResolvedDefinitions();
    var entries = new List<VaultRegistryEntry>(definitions.Count);

    foreach (var definition in definitions)
    {
        entries.Add(await EnsureResolvedSystemManagedVaultAsync(definition, cancellationToken));
    }

    return entries;
}
Enter fullscreen mode Exit fullscreen mode

The system automatically creates and manages the following vaults:

  • hagiprojectdata: Project data storage for saving project configuration and state
  • personaldata: Personal data storage for saving user preferences
  • hbsprompt: Prompt template library for managing commonly used AI prompts

These vaults are automatically initialized at system startup without manual user configuration. After all, some things are best left to the system—why should humans worry about them?

Access Control Mechanism

An important design is access control. The system divides vaults into two access types:

export interface VaultForText {
  id: string;
  name: string;
  type: string;
  physicalPath: string;
  accessType: 'read' | 'write';  // Key: distinguishes read-only from editable
}
Enter fullscreen mode Exit fullscreen mode
  • reference (read-only): AI only uses for analysis and understanding, cannot modify content. Suitable for reference open-source projects, documentation, etc.
  • editable (can edit): AI can modify content as needed for tasks. Suitable for your notes, drafts, etc.

This distinction is important. It lets AI know which content is "read-only reference" and what "can be modified," avoiding the risk of accidental operations. After all, no one wants their hard work inadvertently erased.

Practice: Creating and Using Vaults

Having covered the principles, let's look at practical usage.

Creating a CodeRef Vault

Here's a complete frontend call example:

const createCodeRefVault = async () => {
  const response = await VaultService.postApiVaults({
    requestBody: {
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/Users/developer/vaults/react-learning",
      gitUrl: "https://github.com/facebook/react.git"
    }
  });

  // The system will automatically:
  // 1. Clone React repository to vault/repos/react
  // 2. Create docs/ directory for notes
  // 3. Generate index.yaml metadata
  // 4. Create AGENTS.md guide file

  return response;
};
Enter fullscreen mode Exit fullscreen mode

This API call completes a series of operations: creating directory structure, initializing Git submodules, generating metadata files, and more. You only need to provide basic information, and the system handles the rest. It's actually quite worry-free.

Using Vaults in AI Proposals

Once a vault is created, you can reference it in AI proposals:

const proposal = composeProposalChiefComplaint({
  chiefComplaint: "Help me analyze React's concurrent rendering mechanism",
  repositories: [
    { id: "react", gitUrl: "https://github.com/facebook/react.git" }
  ],
  vaults: [
    {
      id: "react-learning",
      name: "React Learning Vault",
      type: "coderef",
      physicalPath: "/vaults/react-learning",
      accessType: "read"  // AI can only read, not modify
    }
  ],
  quickRequestText: "Focus on fiber architecture and scheduler implementation"
});
Enter fullscreen mode Exit fullscreen mode

The system automatically injects vault information into the AI's context, letting AI know what learning resources you have available. AI understanding your ideas is, I suppose, a form of rare tacit understanding.

Best Practices and Considerations

In the process of using the Vault system, we've summarized some experiences and lessons.

Path Safety

The system strictly validates paths to prevent path traversal attacks:

private static string ResolveFilePath(string vaultRoot, string relativePath)
{
    var rootPath = EnsureTrailingSeparator(Path.GetFullPath(vaultRoot));
    var combinedPath = Path.GetFullPath(Path.Combine(rootPath, relativePath));
    if (!combinedPath.StartsWith(rootPath, StringComparison.OrdinalIgnoreCase))
    {
        throw new BusinessException(VaultRelativePathTraversalCode,
            "Vault file paths must stay inside the registered vault root.");
    }
    return combinedPath;
}
Enter fullscreen mode Exit fullscreen mode

This is important. If you're customizing vault paths, ensure paths are within allowed ranges, or the system will refuse the operation. When it comes to security, you can't overemphasize it.

Git Submodule Management

CodeRef vaults recommend using Git submodules rather than directly copying code:

private static string BuildCodeRefAgentsContent()
{
    return """
    # CodeRef Vault Guide

    Repositories under `repos/` should be maintained through Git submodules
    rather than copied directly into the vault root.

    Keep this structure stable so assistants and tools can understand the vault quickly.
    """ + Environment.NewLine;
}
Enter fullscreen mode Exit fullscreen mode

This approach has several benefits: keeping code synchronized with upstream, saving disk space, and facilitating management of multiple code versions. After all, who wants to repeatedly download the same things?

File Preview Limitations

To prevent performance issues, the system limits file size and type:

private const int FileEnumerationLimit = 500;
private const int PreviewByteLimit = 256 * 1024;  // 256KB
Enter fullscreen mode Exit fullscreen mode

If your vault contains large numbers of files or very large files, preview function performance may be affected. In such cases, consider batch processing or using specialized search tools. After all, some things are too big and become difficult to handle.

Diagnostic Information

When creating a vault, diagnostic information is returned to help with debugging:

List<VaultBootstrapDiagnosticDto> bootstrapDiagnostics = [];

if (IsCodeRefVaultType(normalizedType))
{
    bootstrapDiagnostics = await EnsureCodeRefBootstrapAsync(
        normalizedName,
        normalizedPhysicalPath,
        normalizedGitUrl,
        cancellationToken);
}
Enter fullscreen mode Exit fullscreen mode

If creation fails, you can check diagnostic information to understand the specific cause. When something goes wrong, check the diagnostics—that's also a way to solve problems.

Summary

The Vault system addresses core pain points of learning projects in the AI era through a unified storage abstraction layer:

  • Centralized Knowledge Management: All learning resources are centralized in one place, no longer scattered
  • Automatic AI Context Injection: AI assistants can automatically understand available learning resources without manually providing context
  • Cross-Project Knowledge Reuse: Multiple learning projects can share and reuse knowledge
  • Standardized Directory Structure: Provides consistent directory structure, reducing learning costs

This solution has been validated in practice in the HagiCode project. If you're also working on AI-assisted development tools or facing similar knowledge management challenges, I hope these experiences provide some reference.

Actually, the value of a technical solution lies not in how complex it is, but in whether it can solve real problems. The Vault system's core idea is simple—establishing a unified, AI-comprehensible knowledge storage layer. But it's precisely this simple abstraction that has significantly improved our development efficiency.

Sometimes, simple is best. After all, complex things often hide more pitfalls...

References


If this article helps you, feel free to give a Star on GitHub, or visit the official site to learn more about HagiCode. Public beta has begun—install now to experience complete AI code assistant functionality.

Perhaps, you can also give it a try...

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Top comments (0)