DEV Community: Guillermo Fernandez

Stop Blaming Your Prompts. It’s the Architecture, Stup1d!

Guillermo Fernandez — Tue, 14 Apr 2026 18:24:28 +0000

Stop Blaming Your Prompt. It’s the Architecture, Stup1d.

📄 Paper: https://zenodo.org/records/19438943

💻 Code: https://github.com/gfernandf/agent-skills

We keep pretending that better prompts will fix LLM agents.

They won’t.

We’ve built an entire layer of tooling, courses, and “best practices” around prompt engineering — as if the problem were linguistic.

It’s not.

It’s architectural.

The uncomfortable truth

Let’s be honest about what most agent systems are doing today:

Take a task
Generate a prompt
Call the model
Hope it “reasons” correctly
Repeat

This is not a system.

This is recomputation disguised as intelligence.

We are replaying cognition, not building it

Every time your agent runs, it:

Reconstructs context
Rebuilds reasoning
Re-derives intermediate steps

There is no reuse of cognition.

No structure.

No persistence.

No abstraction layer.

Just prompts.

We are not building systems. We are replaying thoughts.

Why prompt engineering feels like it works (until it doesn’t)

Prompt engineering gives the illusion of control:

Add more instructions
Add more examples
Add more constraints

And yes — performance improves.

Until it plateaus.

Because all of that lives inside a single forward pass.

No memory of reasoning.

No composability.

No reuse.

It’s like trying to fix software architecture by writing better comments.

The real problem is architectural

The core issue is simple:

We are using LLMs as stateless reasoning engines.

And then compensating for that with increasingly complex prompts.

Instead of:

modeling cognition
structuring reasoning
reusing intermediate steps

We regenerate everything every time.

That doesn’t scale.

Not in cost.

Not in latency.

Not in reliability.

What’s actually missing

What’s missing is not a better prompt.

It’s a runtime layer that:

Encodes reusable cognitive steps
Separates reasoning into structured components
Allows composition instead of regeneration

In other words:

A system that reuses cognition instead of recomputing it.

From prompts to skills

Instead of:

→ Prompt → Model → Output

You need:

→ Skill → Execution → Structured Output

Not conceptually. Operationally.

This is exactly what ORCA implements: a runtime layer where “skills” are reusable cognitive units — not prompts.

Defined inputs.

Structured outputs.

Explicit execution.

No recomputation. No guesswork.

Most “agent frameworks” today?

prompt orchestration
tool wrappers
retry loops with nicer formatting

They don’t model cognition.

They orchestrate prompts.

That’s not a runtime.

The shift is not better prompting.

It’s architectural.

From:

stateless generation

To:

structured, reusable cognition

That’s the gap ORCA is designed to close.

Prompt engineering isn’t useless.

It’s just solving the wrong problem.

We’ve been optimizing the interface instead of the system.

And it shows.

If you’ve pushed prompt engineering far enough, you’ve seen the limit.

The question is:

are you ready to try what will replace it?

Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)

Guillermo Fernandez — Wed, 08 Apr 2026 12:08:36 +0000

Why Prompt-Based Agents Don’t Scale (and What We’re Trying Instead)

Most agent systems today are, at their core, prompt pipelines.

We chain prompts, add tools, inject memory, and hope that the system behaves consistently. This works surprisingly well for simple cases — but starts to break down as complexity increases.

After experimenting with different approaches, I’ve been exploring an alternative: introducing a cognitive runtime layer between the agent and the tools.

I call this approach ORCA.

The Problem with Prompt Pipelines

In most current designs, a single layer (the prompt) is responsible for:

deciding what to do
selecting tools
executing actions
interpreting results

This creates a few issues:

low observability — hard to understand what the agent is doing
poor composability — workflows don’t reuse well
fragility — small prompt changes can break behavior
implicit execution — logic is buried in text

A Different Approach: A Cognitive Runtime Layer

Instead of encoding everything in prompts, ORCA separates concerns explicitly:

1. Capabilities

Atomic cognitive operations such as:

retrieve
transform
evaluate

These are the building blocks of reasoning.

2. Skills

Composable workflows built from capabilities.

Think of them as structured procedures rather than prompt chains.

3. Execution Model

Execution is explicit and structured, not hidden inside prompts.

This allows:

tracing
validation
control over intermediate steps

4. Agent Orchestration

The agent is still responsible for decision-making, but it delegates execution to the runtime layer.

Why This Might Matter

The hypothesis behind ORCA is that separating:

cognition
execution
orchestration

can improve:

composability
observability
control over execution

In other words, moving from:

prompt-driven behavior

to:

structured cognitive execution

Open Source + Paper

I’ve implemented a first version of this idea:

👉 GitHub:
https://github.com/gfernandf/agent-skills

And documented the architecture and design principles in a paper:

👉 Paper (DOI):
https://doi.org/10.5281/zenodo.19438943

The paper is also being submitted to arXiv.

Open Questions

This is still exploratory, and I’d be very interested in feedback on:

how granular capabilities should be before overhead dominates
whether declarative execution models can realistically replace prompt pipelines
where this approach would break in real-world systems

Closing Thought

We’ve made huge progress treating LLMs as reasoning engines.

But most current agent systems still rely on unstructured execution layers.

If agents are going to scale, we may need to start treating execution as a first-class concern — not something embedded in prompts.

Happy to discuss ideas, trade-offs, or real-world use cases.

Introducing ORCA: executable skills and capabilities for AI agent workflows

Guillermo Fernandez — Mon, 30 Mar 2026 20:55:01 +0000

The problem

Most agent frameworks are great at orchestrating LLM calls. But when it comes to deterministic operations — parsing documents, validating schemas, deduplicating records — you end up writing custom glue code every time.

I wanted a framework where agents could actually execute, not just plan.

What is ORCA?

ORCA (Open Runtime for Capable Agents) is an open-source Python framework that treats agent actions as capabilities — declarative YAML contracts that can be wired to any backend: pure Python, OpenAI, or external APIs.

You compose capabilities into skills (multi-step workflows defined as DAGs), and the runtime handles scheduling, policy enforcement, and state tracking.

What's included

122 capabilities with deterministic Python baselines — no API keys needed
36 ready-to-use skills composed from those capabilities
DAG scheduler with policy gates and cognitive state tracking
Scaffold wizard to build new skills in minutes
Auto-bindings for OpenAI and PythonCall out of the box
Adapters for LangChain, CrewAI, and Semantic Kernel
MCP server support

Quick example

A capability is a YAML contract:


yaml
id: doc.content.chunk
description: Split a document into semantic chunks
input:
  content: { type: string }
  max_tokens: { type: integer, default: 512 }
output:
  chunks: { type: array, items: { type: string } }


A skill composes multiple capabilities as a DAG:

id: document.summarize
steps:
  - id: chunk
    capability: doc.content.chunk
  - id: summarize
    capability: text.body.summarize
    depends_on: [chunk]
  - id: compile
    capability: text.report.compile
    depends_on: [summarize]


The scheduler resolves dependencies, runs steps in parallel when possible, and enforces policies at each gate.

Why not just use LangChain / CrewAI directly?
You can — ORCA has adapters for both. The difference is in the abstraction layer:

      **Traditional frameworks  ORCA**
Actions         Code functions          Declarative YAML capabilities
Composition Imperative chains   DAG-based skills
Execution   LLM-dependent           Det. baselines + LLM fallback
Portability Framework-locked    Bind to any backend

ORCA doesn't replace your orchestration framework — it gives your agents a portable, testable skill layer underneath.

Getting started
pip install agent-skills

from agent_skills import Registry

registry = Registry()
result = registry.execute("doc.content.chunk", {
    "content": "Your document text here...",
    "max_tokens": 256
})
print(result["chunks"])

Links
GitHub: github.com/gfernandf/agent-skills
Documentation: gfernandf.github.io/agent-skills
Registry: github.com/gfernandf/agent-skill-registry
Feedback welcome
The project is early — just made the repos public this week. I'd appreciate feedback on the design, missing capabilities, or anything that feels rough.

Drop a comment here or open an issue on GitHub — happy to discuss.