AI Lost in the Fragmented IDL Pile: Building a Single Context Hub with 'Unified AST'

#architecture #devlog #ai #mcp

In the previous article, we discussed the importance of semantic modeling to specify the identity of data and prevent hallucinations in AI agents (Cursor, Windsurf, etc.) or MCP protocols. However, when we try to apply this in practice, we hit the largest and coldest wall of reality: Fragmented Legacy.

"The backend team uses Thrift, while the client uses old code in Protobuf. Plus, design data is scattered in Excel. When we ask an AI agent to write the entire system code, it mixes these three and makes a mess. Do we have to throw everything away and rewrite with a new standard?"

Semantic modeling design started from this desperate question. Instead of breaking everything and rebuilding, an architecture was adopted: the Unified AST (Unified Abstract Syntax Tree) Hub, where all these fragmented schemas gather into one giant knowledge graph.

1. The Tragedy of Migration: AI Cannot Bridge Fragmented Contexts

Many development teams attempt to migrate with the thought, "Let's move to a better IDL standard all at once," but fail. The reason isn't a lack of human technical skill, but the impossibility of simultaneous transition across systems.

AI agents also get lost in this swamp of fragmentation.

Disconnected Metadata: When modifying a server packet, the AI cannot simultaneously attend to the validation rules of the Excel data associated with that packet.
Failure in Context Switching: After reading Thrift, when the AI generates a Protobuf structure, it misses subtle differences in field types (Optional vs. Required), creating fatal bugs.

What the AI needs isn't just access to the code. It needs a Single Context Hub that proves all these scattered pieces of data actually point to the same business logic.

2. The Solution: A Semantic Dialect Engine Based on 'Unified AST'

To solve this fragmentation, a compiler was designed to read different IDLs (Protobuf, Thrift, etc.) with independent parsers and internally merge these disparate languages into a common Unified AST.

① Non-Destructive Hybrid Co-existence

There's no need to abandon legacy code for the AI era. The semantic engine encapsulates existing .proto or .thrift files as safe External Dialects.

// ai_context.deuk
// 1. Import existing Protobuf files as-is (absolutely NO modifications)
include "legacy/item_base.proto"

// 2. Wrap legacy types in a 'meta-cognitive intent' shared by humans and AI
table<item_base::Item> = { key: "item_id" }

Thanks to this design, a team can maintain their existing infrastructure with 0.1% disruption while instantly providing the AI agent with a firm context: "The Item message in Protobuf is actually the baseline design information (Table) coming from Excel."

② Context Resolver

This is more than simple text parsing. The context resolver lifts all characteristics of the base language—such as Protobuf's field numbering system or Thrift's Optional attributes—into the Unified AST without losing them. Types promoted to the Unified AST in this way gain a perfectly equal "status."

3. Architecture Diagram: A Single Source of Truth (SSOT) for AI

The Unified AST design passes fragmented formats through a single intelligent hub.

   [Legacy & New Sources]          [Semantic Hub]             [AI & Output Code]

   legacy.proto (Protobuf)  ----+                            +-- C# (Unity Client)
                                |                            |
   legacy.thrift (Thrift)   ----+--> [ Unified AST ] --------+-- TypeScript (Web)
                                |       (Engine)             |
   new_logic.deuk (Schema)  ----+                            +-- C++ (Server)
                                                             |
                                                             +-- Agent/MCP Rule Sets

Now, a single hub pierces through all areas. When a server packet is modified, not only is the client model updated synchronously, but the validation rules for Excel data are also connected. To AI coding assistants (Cursor, Copilot, etc.), what is given as context is not fragmented .proto files and Excel spreadsheets, but a perfectly refined Single Source of Truth (SSOT).

Conclusion: Architecture Reduces AI's Cognitive Load

The Unified AST architecture was not born simply to create a "pretty IDL."
It is an extreme engineering response to the question: "How can we prevent humans and AIs from falling into the swamp of fragmented legacy systems and allow them to control the entire system on a single knowledge graph?"

By unifying system fragmentation and opening a clear path for the machine, human developers can finally focus solely on the essence of the business. That is the greatest value proven by this hybrid architecture.

Continues in the next article:

[Why AI Servers Die from OOM: Designing a Zero-Allocation Protocol] explores the technical reality of how data passed through this Unified AST is transmitted over the wire at hyper-speed—without wasting a single byte of memory allocation at runtime.

Project DeukPack

This article series is based on the design notes of DeukPack, an open-source infrastructure created to prevent data fragmentation.

GitHub: DeukPack OSS