DEV Community

Guillermo Fernandez
Guillermo Fernandez

Posted on

Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)

Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)

Most agent systems today are, at their core, prompt pipelines.

We chain prompts, add tools, inject memory, and hope that the system behaves consistently. This works surprisingly well for simple cases — but starts to break down as complexity increases.

After experimenting with different approaches, I’ve been exploring an alternative: introducing a cognitive runtime layer between the agent and the tools.

I call this approach ORCA.


The Problem with Prompt Pipelines

In most current designs, a single layer (the prompt) is responsible for:

  • deciding what to do
  • selecting tools
  • executing actions
  • interpreting results

This creates a few issues:

  • low observability — hard to understand what the agent is doing
  • poor composability — workflows don’t reuse well
  • fragility — small prompt changes can break behavior
  • implicit execution — logic is buried in text

A Different Approach: A Cognitive Runtime Layer

Instead of encoding everything in prompts, ORCA separates concerns explicitly:

1. Capabilities

Atomic cognitive operations such as:

  • retrieve
  • transform
  • evaluate

These are the building blocks of reasoning.


2. Skills

Composable workflows built from capabilities.

Think of them as structured procedures rather than prompt chains.


3. Execution Model

Execution is explicit and structured, not hidden inside prompts.

This allows:

  • tracing
  • validation
  • control over intermediate steps

4. Agent Orchestration

The agent is still responsible for decision-making, but it delegates execution to the runtime layer.


Why This Might Matter

The hypothesis behind ORCA is that separating:

  • cognition
  • execution
  • orchestration

can improve:

  • composability
  • observability
  • control over execution

In other words, moving from:

prompt-driven behavior

to:

structured cognitive execution


Open Source + Paper

I’ve implemented a first version of this idea:

👉 GitHub:
https://github.com/gfernandf/agent-skills

And documented the architecture and design principles in a paper:

👉 Paper (DOI):
https://doi.org/10.5281/zenodo.19438943

The paper is also being submitted to arXiv.


Open Questions

This is still exploratory, and I’d be very interested in feedback on:

  • how granular capabilities should be before overhead dominates
  • whether declarative execution models can realistically replace prompt pipelines
  • where this approach would break in real-world systems

Closing Thought

We’ve made huge progress treating LLMs as reasoning engines.

But most current agent systems still rely on unstructured execution layers.

If agents are going to scale, we may need to start treating execution as a first-class concern — not something embedded in prompts.


Happy to discuss ideas, trade-offs, or real-world use cases.

Top comments (2)

Collapse
 
gfernandf profile image
Guillermo Fernandez

Small side ask:

I’m submitting the paper to arXiv (cs.AI) and need an endorsement to complete the submission.

If anyone here is active on arXiv and open to helping, it takes less than a minute:

arxiv.org/auth/endorse?x=GAU4NP

Totally understand if not — just thought I’d ask.

Collapse
 
gfernandf profile image
Guillermo Fernandez

Happy to go deeper into specific parts (e.g. execution model or design principles) if useful.