Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)
Most agent systems today are, at their core, prompt pipelines.
We chain prompts, add tools, inject memory, and hope that the system behaves consistently. This works surprisingly well for simple cases — but starts to break down as complexity increases.
After experimenting with different approaches, I’ve been exploring an alternative: introducing a cognitive runtime layer between the agent and the tools.
I call this approach ORCA.
The Problem with Prompt Pipelines
In most current designs, a single layer (the prompt) is responsible for:
- deciding what to do
- selecting tools
- executing actions
- interpreting results
This creates a few issues:
- low observability — hard to understand what the agent is doing
- poor composability — workflows don’t reuse well
- fragility — small prompt changes can break behavior
- implicit execution — logic is buried in text
A Different Approach: A Cognitive Runtime Layer
Instead of encoding everything in prompts, ORCA separates concerns explicitly:
1. Capabilities
Atomic cognitive operations such as:
- retrieve
- transform
- evaluate
These are the building blocks of reasoning.
2. Skills
Composable workflows built from capabilities.
Think of them as structured procedures rather than prompt chains.
3. Execution Model
Execution is explicit and structured, not hidden inside prompts.
This allows:
- tracing
- validation
- control over intermediate steps
4. Agent Orchestration
The agent is still responsible for decision-making, but it delegates execution to the runtime layer.
Why This Might Matter
The hypothesis behind ORCA is that separating:
- cognition
- execution
- orchestration
can improve:
- composability
- observability
- control over execution
In other words, moving from:
prompt-driven behavior
to:
structured cognitive execution
Open Source + Paper
I’ve implemented a first version of this idea:
👉 GitHub:
https://github.com/gfernandf/agent-skills
And documented the architecture and design principles in a paper:
👉 Paper (DOI):
https://doi.org/10.5281/zenodo.19438943
The paper is also being submitted to arXiv.
Open Questions
This is still exploratory, and I’d be very interested in feedback on:
- how granular capabilities should be before overhead dominates
- whether declarative execution models can realistically replace prompt pipelines
- where this approach would break in real-world systems
Closing Thought
We’ve made huge progress treating LLMs as reasoning engines.
But most current agent systems still rely on unstructured execution layers.
If agents are going to scale, we may need to start treating execution as a first-class concern — not something embedded in prompts.
Happy to discuss ideas, trade-offs, or real-world use cases.
Top comments (2)
Small side ask:
I’m submitting the paper to arXiv (cs.AI) and need an endorsement to complete the submission.
If anyone here is active on arXiv and open to helping, it takes less than a minute:
arxiv.org/auth/endorse?x=GAU4NP
Totally understand if not — just thought I’d ask.
Happy to go deeper into specific parts (e.g. execution model or design principles) if useful.