Evgeniy Kormin

Posted on Jun 18

Golden Armada: What AI-Native Software Looks Like in Execution

#ai #programming #machinelearning

👉 Source code & system:

golden_armada
Programming-Paradigm-for-AI-Written-Software

👉 Previous article (context):

a vibe coding programming paradigm

What happens when software is no longer primarily written — but executed through AI-driven decisions?

In this article, I want to show a working system and what it actually produces at runtime.

No manifesto. No theory expansion.

Just execution.

Context: from “vibe coding” to runtime reality

In the previous article I introduced the idea of vibe coding — a programming paradigm where AI becomes the primary code generator and the human shifts toward intent specification rather than implementation.

That idea raises an immediate question:

What does such a system actually look like when it runs?

Golden Armada is my attempt to answer that question experimentally.

What is Golden Armada?

Golden Armada is an AI-native workflow engine where:

a user triggers actions through a structured UI
an LLM (“DeepAgent”) plans execution steps
a strict contract system applies mutations
every action is recorded as an immutable trace

The key idea is simple:

The system is not understood through code — but through execution traces.

Architecture overview

The system follows a strict execution pipeline:

User Action
   ↓
Intake Layer
   ↓
Workflow Loading
   ↓
LLM Planning (DeepAgent)
   ↓
Operation Execution
   ↓
Event Store Append
   ↓
Trace Flush

Key constraints:

all operations are strongly typed
execution is deterministic after planning
state changes are event-based
everything is observable

This is intentionally closer to an execution machine than a traditional application.

Why traces matter more than code

In traditional systems, debugging means reading code and inferring behavior.

In Golden Armada, debugging means:

reading execution traces and reconstructing system behavior

This is closer to how distributed systems already work — but extended to AI-driven decision layers.

Real execution trace (example)

Below is a real trace produced by the system:

trace_id: ea7e24fc11dc43bdbfeacedbc628e9fd

[OK] Request received
button=split_node

[OK] Workflow loaded
workflow=wf_001 v1

[OK] Agent planning stage
operation=split
node=split

[OK] Event store append
patch_applied=true

[OK] Handler executed
duration: 0ms

Total duration: 29.7s

👉 Full logs available here:
https://github.com/evgeniykormin86-stack/golden_armada/tree/main/logs

What this trace shows

Even in this small example, we can observe:

where time is spent (LLM planning dominates)
how deterministic execution follows planning
how workflow mutation happens via operations
how the system maintains auditability

The important shift is:

The runtime becomes the primary interface of understanding.

Failure case (equally important)

Not all executions succeed.

Example:

[ERR] Node not found: split

This trace shows:

LLM correctly planned an operation
but workflow state did not match expectation
system failed deterministically during execution

This is important:

AI planning is not the same as system validity.

Design philosophy behind the system

Golden Armada is built around four constraints:

AI changes duplication cost. New behaviors can be introduced by generating new “skills”, not rewriting core logic.
Graph complexity is intentionally constrained. Workflow depth is limited to avoid exponential reasoning complexity.
Contracts replace implicit structure. All communication between components is strictly typed and validated.
Observability is a first-class feature. If it is not traceable, it does not exist in the system.

What this system is NOT

To avoid misunderstandings:

it is not an autonomous agent system
it is not production-ready infrastructure
it is not a replacement for software engineering
it is not a fully self-evolving system

It is:

an experimental execution environment for AI-driven workflows with full observability.

Why this matters (core insight)

Most AI systems today fail not because they cannot generate output — but because:

their internal decision process is not observable or reproducible.

Golden Armada explores a different direction:

make execution observable first, intelligent second.

Limitations (important for honesty)

Current limitations include:

limited scale of workflows
non-deterministic LLM planning
trace volume grows quickly
debugging still requires human interpretation
system complexity increases with feature expansion

This is expected at this stage.

Future direction

Next steps in this experiment include:

automatic test generation from traces
failure clustering and pattern detection
trace-based debugging UI
regression testing from execution history
contract evolution based on observed failures

Closing thought

Golden Armada is not an answer.

It is a question:

What does software become when execution, not code, is the primary artifact?

We are still early — but we can already observe meaningful structure emerging from runtime behavior.

DEV Community