π Source code & system:
golden_armada
Programming-Paradigm-for-AI-Written-Software
π Previous article (context):
a vibe coding programming paradigm
What happens when software is no longer primarily written β but executed through AI-driven decisions?
In this article, I want to show a working system and what it actually produces at runtime.
No manifesto. No theory expansion.
Just execution.
Context: from βvibe codingβ to runtime reality
In the previous article I introduced the idea of vibe coding β a programming paradigm where AI becomes the primary code generator and the human shifts toward intent specification rather than implementation.
That idea raises an immediate question:
What does such a system actually look like when it runs?
Golden Armada is my attempt to answer that question experimentally.
What is Golden Armada?
Golden Armada is an AI-native workflow engine where:
- a user triggers actions through a structured UI
- an LLM (βDeepAgentβ) plans execution steps
- a strict contract system applies mutations
- every action is recorded as an immutable trace
The key idea is simple:
The system is not understood through code β but through execution traces.
Architecture overview
The system follows a strict execution pipeline:
User Action
β
Intake Layer
β
Workflow Loading
β
LLM Planning (DeepAgent)
β
Operation Execution
β
Event Store Append
β
Trace Flush
Key constraints:
- all operations are strongly typed
- execution is deterministic after planning
- state changes are event-based
- everything is observable
This is intentionally closer to an execution machine than a traditional application.
Why traces matter more than code
In traditional systems, debugging means reading code and inferring behavior.
In Golden Armada, debugging means:
reading execution traces and reconstructing system behavior
This is closer to how distributed systems already work β but extended to AI-driven decision layers.
Real execution trace (example)
Below is a real trace produced by the system:
trace_id: ea7e24fc11dc43bdbfeacedbc628e9fd
[OK] Request received
button=split_node
[OK] Workflow loaded
workflow=wf_001 v1
[OK] Agent planning stage
operation=split
node=split
[OK] Event store append
patch_applied=true
[OK] Handler executed
duration: 0ms
Total duration: 29.7s
π Full logs available here:
https://github.com/evgeniykormin86-stack/golden_armada/tree/main/logs
What this trace shows
Even in this small example, we can observe:
- where time is spent (LLM planning dominates)
- how deterministic execution follows planning
- how workflow mutation happens via operations
- how the system maintains auditability
The important shift is:
The runtime becomes the primary interface of understanding.
Failure case (equally important)
Not all executions succeed.
Example:
[ERR] Node not found: split
This trace shows:
- LLM correctly planned an operation
- but workflow state did not match expectation
- system failed deterministically during execution
This is important:
AI planning is not the same as system validity.
Design philosophy behind the system
Golden Armada is built around four constraints:
- AI changes duplication cost. New behaviors can be introduced by generating new βskillsβ, not rewriting core logic.
- Graph complexity is intentionally constrained. Workflow depth is limited to avoid exponential reasoning complexity.
- Contracts replace implicit structure. All communication between components is strictly typed and validated.
- Observability is a first-class feature. If it is not traceable, it does not exist in the system.
What this system is NOT
To avoid misunderstandings:
- it is not an autonomous agent system
- it is not production-ready infrastructure
- it is not a replacement for software engineering
- it is not a fully self-evolving system
It is:
an experimental execution environment for AI-driven workflows with full observability.
Why this matters (core insight)
Most AI systems today fail not because they cannot generate output β but because:
their internal decision process is not observable or reproducible.
Golden Armada explores a different direction:
make execution observable first, intelligent second.
Limitations (important for honesty)
Current limitations include:
- limited scale of workflows
- non-deterministic LLM planning
- trace volume grows quickly
- debugging still requires human interpretation
- system complexity increases with feature expansion
This is expected at this stage.
Future direction
Next steps in this experiment include:
- automatic test generation from traces
- failure clustering and pattern detection
- trace-based debugging UI
- regression testing from execution history
- contract evolution based on observed failures
Closing thought
Golden Armada is not an answer.
It is a question:
What does software become when execution, not code, is the primary artifact?
We are still early β but we can already observe meaningful structure emerging from runtime behavior.
Top comments (0)