DEV Community

Cover image for Golden Armada: What AI-Native Software Looks Like in Execution
Evgeniy Kormin
Evgeniy Kormin

Posted on

Golden Armada: What AI-Native Software Looks Like in Execution

πŸ‘‰ Source code & system:

golden_armada
Programming-Paradigm-for-AI-Written-Software

πŸ‘‰ Previous article (context):

a vibe coding programming paradigm

What happens when software is no longer primarily written β€” but executed through AI-driven decisions?

In this article, I want to show a working system and what it actually produces at runtime.

No manifesto. No theory expansion.

Just execution.

Context: from β€œvibe coding” to runtime reality

In the previous article I introduced the idea of vibe coding β€” a programming paradigm where AI becomes the primary code generator and the human shifts toward intent specification rather than implementation.

That idea raises an immediate question:

What does such a system actually look like when it runs?

Golden Armada is my attempt to answer that question experimentally.

What is Golden Armada?

Golden Armada is an AI-native workflow engine where:

  • a user triggers actions through a structured UI
  • an LLM (β€œDeepAgent”) plans execution steps
  • a strict contract system applies mutations
  • every action is recorded as an immutable trace

The key idea is simple:

The system is not understood through code β€” but through execution traces.

Architecture overview

The system follows a strict execution pipeline:

User Action
   ↓
Intake Layer
   ↓
Workflow Loading
   ↓
LLM Planning (DeepAgent)
   ↓
Operation Execution
   ↓
Event Store Append
   ↓
Trace Flush
Enter fullscreen mode Exit fullscreen mode

Key constraints:

  • all operations are strongly typed
  • execution is deterministic after planning
  • state changes are event-based
  • everything is observable

This is intentionally closer to an execution machine than a traditional application.

Why traces matter more than code

In traditional systems, debugging means reading code and inferring behavior.

In Golden Armada, debugging means:

reading execution traces and reconstructing system behavior

This is closer to how distributed systems already work β€” but extended to AI-driven decision layers.

Real execution trace (example)

Below is a real trace produced by the system:

trace_id: ea7e24fc11dc43bdbfeacedbc628e9fd

[OK] Request received
button=split_node

[OK] Workflow loaded
workflow=wf_001 v1

[OK] Agent planning stage
operation=split
node=split

[OK] Event store append
patch_applied=true

[OK] Handler executed
duration: 0ms

Total duration: 29.7s
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ Full logs available here:
https://github.com/evgeniykormin86-stack/golden_armada/tree/main/logs

What this trace shows

Even in this small example, we can observe:

  • where time is spent (LLM planning dominates)
  • how deterministic execution follows planning
  • how workflow mutation happens via operations
  • how the system maintains auditability

The important shift is:

The runtime becomes the primary interface of understanding.

Failure case (equally important)

Not all executions succeed.

Example:

[ERR] Node not found: split
Enter fullscreen mode Exit fullscreen mode

This trace shows:

  • LLM correctly planned an operation
  • but workflow state did not match expectation
  • system failed deterministically during execution

This is important:

AI planning is not the same as system validity.

Design philosophy behind the system

Golden Armada is built around four constraints:

  1. AI changes duplication cost. New behaviors can be introduced by generating new β€œskills”, not rewriting core logic.
  2. Graph complexity is intentionally constrained. Workflow depth is limited to avoid exponential reasoning complexity.
  3. Contracts replace implicit structure. All communication between components is strictly typed and validated.
  4. Observability is a first-class feature. If it is not traceable, it does not exist in the system.

What this system is NOT

To avoid misunderstandings:

  • it is not an autonomous agent system
  • it is not production-ready infrastructure
  • it is not a replacement for software engineering
  • it is not a fully self-evolving system

It is:

an experimental execution environment for AI-driven workflows with full observability.

Why this matters (core insight)

Most AI systems today fail not because they cannot generate output β€” but because:

their internal decision process is not observable or reproducible.

Golden Armada explores a different direction:

make execution observable first, intelligent second.

Limitations (important for honesty)

Current limitations include:

  • limited scale of workflows
  • non-deterministic LLM planning
  • trace volume grows quickly
  • debugging still requires human interpretation
  • system complexity increases with feature expansion

This is expected at this stage.

Future direction

Next steps in this experiment include:

  • automatic test generation from traces
  • failure clustering and pattern detection
  • trace-based debugging UI
  • regression testing from execution history
  • contract evolution based on observed failures

Closing thought

Golden Armada is not an answer.

It is a question:

What does software become when execution, not code, is the primary artifact?

We are still early β€” but we can already observe meaningful structure emerging from runtime behavior.

Top comments (0)