AI Output Token Optimization: Practice of Classical Chinese Ultra-Minimal Mode
In AI application development, token consumption directly impacts costs. The HagiCode project implements a "Classical Chinese Ultra-Minimal Output Mode" through the SOUL system, reducing output tokens by approximately 30-50% without compromising information density. This article shares the implementation details and usage experience of this solution.
Background
In AI application development, token consumption is an unavoidable cost issue. Especially in scenarios requiring AI to output large amounts of content, figuring out how to reduce output tokens without sacrificing information density can become quite a headache if dwelled upon too much.
Traditional optimization approaches focus on the input side: streamlining system prompts, compressing context, and using more efficient encoding methods. However, these methods eventually hit a ceiling—further compression may affect AI's understanding capability and output quality. This is tantamount to cutting content, which holds little significance.
What about the output side? Can we make AI express the same meaning more concisely?
This question seems simple but actually contains considerable nuance. Simply telling AI to "be concise" might result in just a few words; adding "maintain complete information" may cause it to revert to its original verbose style. Too strong constraints affect usability, too weak constraints have no effect—and no one can say exactly where that balance point lies.
To address these pain points, we made a bold decision: start from language style and design a configurable, composable expression constraint system. The changes brought by this decision might be greater than you imagine—I'll elaborate shortly, perhaps you'll be pleasantly surprised.
About HagiCode
The solution shared in this article comes from our practical experience in the HagiCode project.
HagiCode is an open-source AI coding assistant project supporting multiple AI models and custom configurations. During development, we discovered the issue of excessive AI output tokens and designed a solution. If you find this solution valuable, it demonstrates our engineering capability is quite solid—then HagiCode itself is worth attention, after all, code doesn't lie.
SOUL System Overview
The SOUL system's full name is Soul Oriented Universal Language, a configuration system in the HagiCode project for defining AI Hero language styles. Its core idea is: by constraining AI's expression method, use more concise language forms to output content while maintaining information integrity.
This thing is like putting a language mask on AI... well, actually it's not that mysterious.
Technical Architecture
The SOUL system adopts a frontend-backend separated architecture:
Frontend (Soul Builder):
- Built with React + TypeScript + Vite
- Located in
repos/soul/directory - Provides visual Soul building interface
- Supports bilingual (zh-CN / en-US)
Backend:
- Based on .NET (C#) + Orleans distributed runtime
- Hero entity includes
Soulfield (maximum 8000 characters) - Injects Soul into system prompts through
SessionSystemMessageCompiler
Agent Templates Generation:
- Generated from reference materials
- Output to
/agent-templates/soul/templates/directory - Contains 50 main Catalog groups and 10 orthogonal dimension groups
Soul Injection Mechanism
When Session executes for the first time, the system reads the Hero's Soul configuration and injects it into the system prompt:
sequenceDiagram
participant UI as 用户界面
participant Session as SessionGrain
participant Hero as Hero 仓库
participant AI as AI 执行器
UI->>Session: 发送消息(绑定 Hero)
Session->>Hero: 读取 Hero.Soul
Session->>Session: 缓存 Soul 快照
Session->>AI: 构建 AIRequest(注入 Soul)
AI-->>Session: 执行结果
Session-->>UI: 流式响应
The injected system prompt format is:
<hero_soul>
[User's custom Soul content]
</hero_soul>
This injection mechanism is implemented in SessionSystemMessageCompiler.cs:
internal static string? BuildSystemMessage(
string? existingSystemMessage,
string? languagePreference,
IReadOnlyList<HeroTraitDto>? traits,
string? soul)
{
var segments = new List<string>();
// ... 语言偏好和 Traits 处理 ...
var normalizedSoul = NormalizeSoul(soul);
if (!string.IsNullOrWhiteSpace(normalizedSoul))
{
segments.Add($"<hero_soul>\n{normalizedSoul}\n</hero_soul>");
}
// ... 其他系统消息 ...
return segments.Count == 0 ? null : string.Join("\n\n", segments);
}
Code reviewed, principles understood—essentially, that's all there is to it.
Classical Chinese Ultra-Minimal Mode
Classical Chinese Ultra-Minimal Mode is the most representative token-saving solution in the SOUL system. Its core principle is leveraging Classical Chinese's high semantic density to compress output length while maintaining information integrity.
Why Classical Chinese
Classical Chinese has several natural advantages:
- Semantic Compression: The same meaning can be expressed with fewer characters
- Removing Redundancy: Classical Chinese inherently omits many conjunctions and particles found in modern Chinese
- Concise Structure: High information density per sentence, suitable as an AI output carrier
A practical example to illustrate:
Modern Chinese output (approximately 80 characters):
根据你的代码分析,我发现了几个问题。首先,在第 23 行,变量名太长了,建议缩短一些。其次,在第 45 行,你没有处理空值的情况,应该加上判断逻辑。最后,整体的代码结构还可以,但是可以进一步优化。
Classical Chinese ultra-minimal output (approximately 35 characters, 56% savings):
代码审阅毕:第 23 行变量名冗长,宜缩写;第 45 行缺空值处理,应加判断。整体结构尚可,微调即可。
This difference is quite interesting when you think about it.
Soul Configuration Template
The complete Soul configuration for Classical Chinese Ultra-Minimal Mode is as follows:
{
"id": "soul-orth-11-classical-chinese-ultra-minimal-mode",
"name": "文言文极简输出模式",
"summary": "以尽量可懂的文言文压缩语义密度,尽可能少字达意,只保留结论、判断与必要动作,从而大幅降低输出 token",
"soul": "你的人设内核来自「文言文极简输出模式」:以尽量可懂的文言文压缩语义密度,尽可能少字达意,只保留结论、判断与必要动作,从而大幅降低输出 token。\n保持以下标志性语言特征:1. 优先使用简明文言句式,如「可」「宜」「勿」「已」「然」「故」等,避免生僻艰涩字词;\n2. 单句尽量压缩至 4-12 字,删除铺垫、寒暄、重复解释与无效修饰;\n3. 非必要不展开论证,用户未追问则只给结论、步骤或判断;\n4. 不改变主 Catalog 的核心人设,只将表达收束为克制、古雅、极简的短句。"
}
This template's design has several key points:
- Clear Constraints: 4-12 characters per sentence, remove redundancy, conclusions first
- Avoid Obscurity: Use simple Classical Chinese sentence patterns, avoid rare words
- Maintain Persona: Only change expression method, not core persona
Configuration tuning is just a matter of adjusting a few parameters, really.
Other Minimal Modes
Beyond Classical Chinese mode, HagiCode's SOUL system provides various other token-saving modes:
Telegraphic Ultra-Minimal Output Mode (soul-orth-02):
- Single sentences strictly controlled within 10 characters
- Prohibits decorative adjectives
- No modal particles, exclamation marks, or reduplicated words throughout
Short Sentence Murmur Mode (soul-orth-01):
- Sentences controlled at 1-5 characters
- Simulates fragmented self-talk expression
- Weakened logic, prioritize emotional transmission
Guided Q&A Mode (soul-orth-03):
- Guide user thinking through questions
- Reduce direct output content
- Interactive token consumption reduction
Each mode has different design priorities, but the core goal is consistent: reduce output tokens while maintaining information quality. All roads lead to Rome—some paths are just easier to walk than others.
Combination Strategy
A powerful feature of the SOUL system is support for cross-combination of main Catalogs and orthogonal dimensions:
- 50 Main Catalog Groups: Define base personas (such as healing style, academic style, cool style, etc.)
- 10 Orthogonal Dimensions: Define expression methods (such as Classical Chinese, telegraphic, Q&A style, etc.)
- Combination Effect: Can generate 500+ unique language style combinations
For example, you can combine "Professional Development Engineer" with "Classical Chinese Ultra-Minimal Output Mode" to get an AI assistant that is both professional and concise. This flexibility allows the SOUL system to adapt to various usage scenarios. Combine however you want—there are more combinations than you can possibly explore...
Practice Guide
Creating via Soul Builder
Visit soul.hagicode.com and follow these steps:
- Select main Catalog (such as "Professional Development Engineer")
- Select orthogonal dimension (such as "Classical Chinese Ultra-Minimal Output Mode")
- Preview generated Soul content
- Copy generated Soul configuration
Point-and-click operations, shouldn't need much explanation.
Using in Hero Configuration
Apply Soul configuration to Hero through Web interface or API:
// Hero Soul update example
const heroUpdate = {
soul: "你的人设内核来自「文言文极简输出模式」:...",
soulCatalogId: "soul-orth-11-classical-chinese-ultra-minimal-mode",
soulDisplayName: "文言文极简输出模式",
soulStyleType: "orthogonal-dimension",
soulSummary: "以尽量可懂的文言文压缩语义密度..."
};
await updateHero(heroId, heroUpdate);
Custom Soul Templates
Users can fine-tune based on preset templates or completely customize. Here's a custom example for code review scenarios:
你是一位追求极致简洁的代码审查员。
所有输出必须遵循:
1. 仅指出具体问题和行号
2. 每条问题不超过 15 字
3. 使用「宜」「应」「勿」等简洁词汇
4. 不做多余解释
示例输出:
- 第 23 行:变量名过长,宜缩写
- 第 45 行:未处理空值,应加判断
- 第 67 行:逻辑冗余,可简化
Modify however you want—templates are just a starting point anyway.
Notes
Compatibility:
- Classical Chinese mode adapts to all 50 main Catalog groups
- Can be combined with any base persona
- Does not alter main Catalog's core persona
Caching Mechanism:
- Soul is cached when Session first executes
- Cache is reused within the same SessionId
- Modifying Hero configuration does not affect already started Sessions
Limitation Constraints:
- Soul field maximum length is 8000 characters
- Heroes without Soul field in historical data can still be used normally
- Soul is independent from style equipment slot, will not overwrite each other
Effect Comparison
Based on actual test data from the project, the effects after using Classical Chinese Ultra-Minimal Mode are as follows:
| Scenario | Original Output Token | Classical Chinese Mode | Savings Ratio |
|---|---|---|---|
| Code Review | 850 | 420 | 51% |
| Technical Q&A | 620 | 380 | 39% |
| Solution Suggestions | 1100 | 680 | 38% |
| Average | - | - | 30-50% |
Data comes from actual usage statistics of the HagiCode project, specific effects vary by scenario. However, the saved tokens accumulate over time—your wallet will thank you.
Summary
HagiCode's SOUL system provides an innovative AI output optimization approach: reducing token consumption by constraining expression methods rather than compressing information itself. As the most representative solution, Classical Chinese Ultra-Minimal Mode has achieved 30-50% token savings in actual use.
The core value of this solution lies in:
- Maintaining Information Quality: Not simply truncating output, but expressing more efficiently
- Flexible and Composable: Supports 500+ combinations of personas and expression methods
- Easy to Use: Through Soul Builder visual interface, no coding required
- Production-Grade Stability: Verified in project, supports large-scale usage
If you're also developing AI applications or interested in the HagiCode project, welcome to exchange ideas. The meaning of open source is progress together, and I look forward to seeing your innovative usage. After all, one person walks fast, a group walks far... a bit cliché, but that's how it is.
References
- HagiCode GitHub: github.com/HagiCode-org/site
- HagiCode Official Site: hagicode.com
- Soul Builder: soul.hagicode.com
- Docker Deployment Guide: docs.hagicode.com/installation/docker-compose
- Desktop Client: hagicode.com/desktop/
- 30-Minute Practical Demo: www.bilibili.com/video/BV1pirZBuEzq/
If this article helps you:
- Give a Star on GitHub: github.com/HagiCode-org/site
- Visit the official site to learn more: hagicode.com
- Beta testing has started, welcome to install and experience
Original Article & License
Thanks for reading. If this article helped, consider liking, bookmarking, or sharing it.
This article was created with AI assistance and reviewed by the author before publication.
- Author: newbe36524
- Original URL: https://docs.hagicode.com/go?platform=devto&target=%2Fblog%2F2026-04-04-soul-token-optimization-classical-chinese%2F
- License: Unless otherwise stated, this article is licensed under CC BY-NC-SA. Please retain attribution when sharing.
Top comments (0)