Skip to content

DEV Community

Guillermo Fernandez

Posted on Apr 8

Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)

#ai #agents #architecture #llm

Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)

Most agent systems today are, at their core, prompt pipelines.

We chain prompts, add tools, inject memory, and hope that the system behaves consistently. This works surprisingly well for simple cases — but starts to break down as complexity increases.

After experimenting with different approaches, I’ve been exploring an alternative: introducing a cognitive runtime layer between the agent and the tools.

I call this approach ORCA.

The Problem with Prompt Pipelines

In most current designs, a single layer (the prompt) is responsible for:

deciding what to do
selecting tools
executing actions
interpreting results

This creates a few issues:

low observability — hard to understand what the agent is doing
poor composability — workflows don’t reuse well
fragility — small prompt changes can break behavior
implicit execution — logic is buried in text

A Different Approach: A Cognitive Runtime Layer

Instead of encoding everything in prompts, ORCA separates concerns explicitly:

1. Capabilities

Atomic cognitive operations such as:

retrieve
transform
evaluate

These are the building blocks of reasoning.

2. Skills

Composable workflows built from capabilities.

Think of them as structured procedures rather than prompt chains.

3. Execution Model

Execution is explicit and structured, not hidden inside prompts.

This allows:

tracing
validation
control over intermediate steps

4. Agent Orchestration

The agent is still responsible for decision-making, but it delegates execution to the runtime layer.

Why This Might Matter

The hypothesis behind ORCA is that separating:

cognition
execution
orchestration

can improve:

composability
observability
control over execution

In other words, moving from:

prompt-driven behavior

to:

structured cognitive execution

Open Source + Paper

I’ve implemented a first version of this idea:

👉 GitHub:
https://github.com/gfernandf/agent-skills

And documented the architecture and design principles in a paper:

👉 Paper (DOI):
https://doi.org/10.5281/zenodo.19438943

The paper is also being submitted to arXiv.

Open Questions

This is still exploratory, and I’d be very interested in feedback on:

how granular capabilities should be before overhead dominates
whether declarative execution models can realistically replace prompt pipelines
where this approach would break in real-world systems

Closing Thought

We’ve made huge progress treating LLMs as reasoning engines.

But most current agent systems still rely on unstructured execution layers.

If agents are going to scale, we may need to start treating execution as a first-class concern — not something embedded in prompts.

Happy to discuss ideas, trade-offs, or real-world use cases.

Top comments (4)

Subscribe

Guillermo Fernandez • Apr 8

Small side ask:

I’m submitting the paper to arXiv (cs.AI) and need an endorsement to complete the submission.

If anyone here is active on arXiv and open to helping, it takes less than a minute:

arxiv.org/auth/endorse?x=GAU4NP

Totally understand if not — just thought I’d ask.

Guillermo Fernandez • Apr 10

My paper is being reviewed at ArXiv and will be out soon, will do a new post for that

Guillermo Fernandez • Apr 8

Happy to go deeper into specific parts (e.g. execution model or design principles) if useful.

Guillermo Fernandez • Apr 9

!!!!