DEV Community

Hagicode
Hagicode

Posted on • Originally published at docs.hagicode.com

AI Output Token Optimization: Practice of Classical Chinese Ultra-Minimal Mode

AI Output Token Optimization: Practice of Classical Chinese Ultra-Minimal Mode

In AI application development, token consumption directly impacts costs. The HagiCode project implements a "Classical Chinese Ultra-Minimal Output Mode" through the SOUL system, reducing output tokens by approximately 30-50% without compromising information density. This article shares the implementation details and usage experience of this solution.

Background

In AI application development, token consumption is an unavoidable cost issue. Especially in scenarios requiring AI to output large amounts of content, figuring out how to reduce output tokens without sacrificing information density can become quite a headache if dwelled upon too much.

Traditional optimization approaches focus on the input side: streamlining system prompts, compressing context, and using more efficient encoding methods. However, these methods eventually hit a ceiling—further compression may affect AI's understanding capability and output quality. This is tantamount to cutting content, which holds little significance.

What about the output side? Can we make AI express the same meaning more concisely?

This question seems simple but actually contains considerable nuance. Simply telling AI to "be concise" might result in just a few words; adding "maintain complete information" may cause it to revert to its original verbose style. Too strong constraints affect usability, too weak constraints have no effect—and no one can say exactly where that balance point lies.

To address these pain points, we made a bold decision: start from language style and design a configurable, composable expression constraint system. The changes brought by this decision might be greater than you imagine—I'll elaborate shortly, perhaps you'll be pleasantly surprised.

About HagiCode

The solution shared in this article comes from our practical experience in the HagiCode project.

HagiCode is an open-source AI coding assistant project supporting multiple AI models and custom configurations. During development, we discovered the issue of excessive AI output tokens and designed a solution. If you find this solution valuable, it demonstrates our engineering capability is quite solid—then HagiCode itself is worth attention, after all, code doesn't lie.

SOUL System Overview

The SOUL system's full name is Soul Oriented Universal Language, a configuration system in the HagiCode project for defining AI Hero language styles. Its core idea is: by constraining AI's expression method, use more concise language forms to output content while maintaining information integrity.

This thing is like putting a language mask on AI... well, actually it's not that mysterious.

Technical Architecture

The SOUL system adopts a frontend-backend separated architecture:

Frontend (Soul Builder):

  • Built with React + TypeScript + Vite
  • Located in repos/soul/ directory
  • Provides visual Soul building interface
  • Supports bilingual (zh-CN / en-US)

Backend:

  • Based on .NET (C#) + Orleans distributed runtime
  • Hero entity includes Soul field (maximum 8000 characters)
  • Injects Soul into system prompts through SessionSystemMessageCompiler

Agent Templates Generation:

  • Generated from reference materials
  • Output to /agent-templates/soul/templates/ directory
  • Contains 50 main Catalog groups and 10 orthogonal dimension groups

Soul Injection Mechanism

When Session executes for the first time, the system reads the Hero's Soul configuration and injects it into the system prompt:

sequenceDiagram
    participant UI as 用户界面
    participant Session as SessionGrain
    participant Hero as Hero 仓库
    participant AI as AI 执行器

    UI->>Session: 发送消息(绑定 Hero)
    Session->>Hero: 读取 Hero.Soul
    Session->>Session: 缓存 Soul 快照
    Session->>AI: 构建 AIRequest(注入 Soul)
    AI-->>Session: 执行结果
    Session-->>UI: 流式响应
Enter fullscreen mode Exit fullscreen mode

The injected system prompt format is:

<hero_soul>
[User's custom Soul content]
</hero_soul>
Enter fullscreen mode Exit fullscreen mode

This injection mechanism is implemented in SessionSystemMessageCompiler.cs:

internal static string? BuildSystemMessage(
    string? existingSystemMessage,
    string? languagePreference,
    IReadOnlyList<HeroTraitDto>? traits,
    string? soul)
{
    var segments = new List<string>();

    // ... 语言偏好和 Traits 处理 ...

    var normalizedSoul = NormalizeSoul(soul);
    if (!string.IsNullOrWhiteSpace(normalizedSoul))
    {
        segments.Add($"<hero_soul>\n{normalizedSoul}\n</hero_soul>");
    }

    // ... 其他系统消息 ...

    return segments.Count == 0 ? null : string.Join("\n\n", segments);
}
Enter fullscreen mode Exit fullscreen mode

Code reviewed, principles understood—essentially, that's all there is to it.

Classical Chinese Ultra-Minimal Mode

Classical Chinese Ultra-Minimal Mode is the most representative token-saving solution in the SOUL system. Its core principle is leveraging Classical Chinese's high semantic density to compress output length while maintaining information integrity.

Why Classical Chinese

Classical Chinese has several natural advantages:

  1. Semantic Compression: The same meaning can be expressed with fewer characters
  2. Removing Redundancy: Classical Chinese inherently omits many conjunctions and particles found in modern Chinese
  3. Concise Structure: High information density per sentence, suitable as an AI output carrier

A practical example to illustrate:

Modern Chinese output (approximately 80 characters):

根据你的代码分析,我发现了几个问题。首先,在第 23 行,变量名太长了,建议缩短一些。其次,在第 45 行,你没有处理空值的情况,应该加上判断逻辑。最后,整体的代码结构还可以,但是可以进一步优化。
Enter fullscreen mode Exit fullscreen mode

Classical Chinese ultra-minimal output (approximately 35 characters, 56% savings):

代码审阅毕:第 23 行变量名冗长,宜缩写;第 45 行缺空值处理,应加判断。整体结构尚可,微调即可。
Enter fullscreen mode Exit fullscreen mode

This difference is quite interesting when you think about it.

Soul Configuration Template

The complete Soul configuration for Classical Chinese Ultra-Minimal Mode is as follows:

{
  "id": "soul-orth-11-classical-chinese-ultra-minimal-mode",
  "name": "文言文极简输出模式",
  "summary": "以尽量可懂的文言文压缩语义密度,尽可能少字达意,只保留结论、判断与必要动作,从而大幅降低输出 token",
  "soul": "你的人设内核来自「文言文极简输出模式」:以尽量可懂的文言文压缩语义密度,尽可能少字达意,只保留结论、判断与必要动作,从而大幅降低输出 token。\n保持以下标志性语言特征:1. 优先使用简明文言句式,如「可」「宜」「勿」「已」「然」「故」等,避免生僻艰涩字词;\n2. 单句尽量压缩至 4-12 字,删除铺垫、寒暄、重复解释与无效修饰;\n3. 非必要不展开论证,用户未追问则只给结论、步骤或判断;\n4. 不改变主 Catalog 的核心人设,只将表达收束为克制、古雅、极简的短句。"
}
Enter fullscreen mode Exit fullscreen mode

This template's design has several key points:

  1. Clear Constraints: 4-12 characters per sentence, remove redundancy, conclusions first
  2. Avoid Obscurity: Use simple Classical Chinese sentence patterns, avoid rare words
  3. Maintain Persona: Only change expression method, not core persona

Configuration tuning is just a matter of adjusting a few parameters, really.

Other Minimal Modes

Beyond Classical Chinese mode, HagiCode's SOUL system provides various other token-saving modes:

Telegraphic Ultra-Minimal Output Mode (soul-orth-02):

  • Single sentences strictly controlled within 10 characters
  • Prohibits decorative adjectives
  • No modal particles, exclamation marks, or reduplicated words throughout

Short Sentence Murmur Mode (soul-orth-01):

  • Sentences controlled at 1-5 characters
  • Simulates fragmented self-talk expression
  • Weakened logic, prioritize emotional transmission

Guided Q&A Mode (soul-orth-03):

  • Guide user thinking through questions
  • Reduce direct output content
  • Interactive token consumption reduction

Each mode has different design priorities, but the core goal is consistent: reduce output tokens while maintaining information quality. All roads lead to Rome—some paths are just easier to walk than others.

Combination Strategy

A powerful feature of the SOUL system is support for cross-combination of main Catalogs and orthogonal dimensions:

  • 50 Main Catalog Groups: Define base personas (such as healing style, academic style, cool style, etc.)
  • 10 Orthogonal Dimensions: Define expression methods (such as Classical Chinese, telegraphic, Q&A style, etc.)
  • Combination Effect: Can generate 500+ unique language style combinations

For example, you can combine "Professional Development Engineer" with "Classical Chinese Ultra-Minimal Output Mode" to get an AI assistant that is both professional and concise. This flexibility allows the SOUL system to adapt to various usage scenarios. Combine however you want—there are more combinations than you can possibly explore...

Practice Guide

Creating via Soul Builder

Visit soul.hagicode.com and follow these steps:

  1. Select main Catalog (such as "Professional Development Engineer")
  2. Select orthogonal dimension (such as "Classical Chinese Ultra-Minimal Output Mode")
  3. Preview generated Soul content
  4. Copy generated Soul configuration

Point-and-click operations, shouldn't need much explanation.

Using in Hero Configuration

Apply Soul configuration to Hero through Web interface or API:

// Hero Soul update example
const heroUpdate = {
  soul: "你的人设内核来自「文言文极简输出模式」:...",
  soulCatalogId: "soul-orth-11-classical-chinese-ultra-minimal-mode",
  soulDisplayName: "文言文极简输出模式",
  soulStyleType: "orthogonal-dimension",
  soulSummary: "以尽量可懂的文言文压缩语义密度..."
};

await updateHero(heroId, heroUpdate);
Enter fullscreen mode Exit fullscreen mode

Custom Soul Templates

Users can fine-tune based on preset templates or completely customize. Here's a custom example for code review scenarios:

你是一位追求极致简洁的代码审查员。
所有输出必须遵循:
1. 仅指出具体问题和行号
2. 每条问题不超过 15 字
3. 使用「宜」「应」「勿」等简洁词汇
4. 不做多余解释

示例输出:
- 第 23 行:变量名过长,宜缩写
- 第 45 行:未处理空值,应加判断
- 第 67 行:逻辑冗余,可简化
Enter fullscreen mode Exit fullscreen mode

Modify however you want—templates are just a starting point anyway.

Notes

Compatibility:

  • Classical Chinese mode adapts to all 50 main Catalog groups
  • Can be combined with any base persona
  • Does not alter main Catalog's core persona

Caching Mechanism:

  • Soul is cached when Session first executes
  • Cache is reused within the same SessionId
  • Modifying Hero configuration does not affect already started Sessions

Limitation Constraints:

  • Soul field maximum length is 8000 characters
  • Heroes without Soul field in historical data can still be used normally
  • Soul is independent from style equipment slot, will not overwrite each other

Effect Comparison

Based on actual test data from the project, the effects after using Classical Chinese Ultra-Minimal Mode are as follows:

Scenario Original Output Token Classical Chinese Mode Savings Ratio
Code Review 850 420 51%
Technical Q&A 620 380 39%
Solution Suggestions 1100 680 38%
Average - - 30-50%

Data comes from actual usage statistics of the HagiCode project, specific effects vary by scenario. However, the saved tokens accumulate over time—your wallet will thank you.

Summary

HagiCode's SOUL system provides an innovative AI output optimization approach: reducing token consumption by constraining expression methods rather than compressing information itself. As the most representative solution, Classical Chinese Ultra-Minimal Mode has achieved 30-50% token savings in actual use.

The core value of this solution lies in:

  1. Maintaining Information Quality: Not simply truncating output, but expressing more efficiently
  2. Flexible and Composable: Supports 500+ combinations of personas and expression methods
  3. Easy to Use: Through Soul Builder visual interface, no coding required
  4. Production-Grade Stability: Verified in project, supports large-scale usage

If you're also developing AI applications or interested in the HagiCode project, welcome to exchange ideas. The meaning of open source is progress together, and I look forward to seeing your innovative usage. After all, one person walks fast, a group walks far... a bit cliché, but that's how it is.

References


If this article helps you:

Original Article & License

Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.

Top comments (0)