DEV Community

Cover image for Stop Prompting. Start Engineering: Building More Reliable LLM Systems with State Machines
Jesus Huerta Martinez
Jesus Huerta Martinez

Posted on

Stop Prompting. Start Engineering: Building More Reliable LLM Systems with State Machines

Most teams building with LLMs eventually hit the same wall:

βœ… A prompt works in testing.

❌ It fails in production.

πŸ”„ Small wording changes cause major behavioral drift.

πŸ” Debugging turns into trial and error.

We usually blame the model.

But what if the real issue is architectural?

What if the problem isn't that LLMs are probabilistic…

but that we design prompts as paragraphs instead of systems?


The Reliability Problem

Natural-language prompting is inherently ambiguous.

Example:

"You are acting as a Customer Service Agent.

When the interaction begins, you are in a neutral, 'waiting' state.

- Scenario A: If the user greets you by saying 'hello,' your next step is to transition into a state where you are waiting for their order number.

- Scenario B: If the user says anything else, you skip the greeting phase and immediately move into processing a refund.

Your response should simply reflect which 'state' or phase of the conversation you are currently in."

βœ… Readable? Yes.

❌ Traceable? Not really.

❌ Deterministic? Definitely not.

In production environments β€” agents, SOC automation, structured workflows β€” ambiguity compounds quickly.

A prompt that "mostly works" is not a system.

It's a fragile experiment.

Prompt: Normal vs. Symbolic


A Different Approach: Treat the Context Window as a State Machine

Instead of writing prompts conversationally, we can structure them like control-flow systems.

Example:

[ROLE] ::= Customer_Service_Agent
[STATE] ::= INITIAL

IF (input == "hello") THEN
   [STATE] ::= AWAITING_ORDER_NUMBER
ELSE
   [STATE] ::= PROCESSING_REFUND
ENDIF

[OUTPUT] ::= CURRENT_STATE
Enter fullscreen mode Exit fullscreen mode

What changed?

β€’ 🎯 Role is explicit. No guessing.

β€’ πŸ“ State is explicit. Declared, not implied.

β€’ πŸ”€ Transitions are explicit. Clear paths.

β€’ πŸ”’ Output is constrained. Predictable format.

This doesn't remove probabilistic behavior.
It reduces entropy by limiting interpretation space.

πŸ’‘ Full framework available: I've documented everything in the Symbolic Prompting GitHub repository with code examples and a free video course.


Core Principles of Structured Prompt Engineering

Here's the methodology I've been developing (Symbolic Prompting Framework):


1. 🎯 Explicit State Declaration

Never assume the model "remembers." You are the state.

[STATE] ::= AUTH_PENDING
Enter fullscreen mode Exit fullscreen mode

Update state explicitly. Make transitions visible. Every change should be traceable.


2. 🧩 Separate Role, Logic, State, and Output

Avoid mixing instructions and execution.

[ROLE] ::= Support_Agent
[STATE] ::= WAITING

[LOGIC]
    IF (input == "help") THEN
       [STATE] ::= HELP_MODE
    ENDIF

[OUTPUT] ::= CURRENT_STATE
Enter fullscreen mode Exit fullscreen mode

When these are separated, debugging becomes possible.


3. βš™οΈ Use Symbolic Control Structures

Instead of describing behavior in prose, define execution paths:

β€’


 /

 ```THEN```

 /

 ```ELSE```

<br>
β€’

 ```WHILE```

 (with manual counters) <br>
β€’

 ```TRY```

 /

 ```CATCH```

 <br>
β€’

 ```GOTO```

 (with caution)

Example:



Enter fullscreen mode Exit fullscreen mode

TRY:
Validate_Input
CATCH (INVALID_FORMAT):
[STATE] ::= SAFE_MODE
The model doesn't "compile" this.



But it **strongly biases execution** toward structured reasoning.

---

## 4. πŸ”½ Reduce Ambiguity to Reduce Variance
LLMs are probabilistic pattern machines.
You cannot make them deterministic.

But you **can:**

β€’ Reduce ambiguity<br>
β€’ Constrain interpretation<br>
β€’ Limit branching<br>
β€’ Force explicit transitions

Less ambiguity β†’ fewer behavioral surprises.

---

## Why This Matters in Production
This approach becomes useful when:

β€’ 🧠 You're building **multi-step agents**<br>
β€’ πŸ”„ You need **predictable state transitions**<br>
β€’ πŸ” You require **traceability**<br>
β€’ πŸ“‹ You operate in **governance-heavy environments**<br>
β€’ ❌ "Mostly works" is **not acceptable**<br>

**Especially in:**

β€’ SOC automation
β€’ Structured customer workflows
β€’ Deterministic LLM wrappers
β€’ Multi-step reasoning pipelines

---

## ⚠️ What This Does NOT Do

Let's be absolutely clear:

β€’ ❌ **It does not eliminate hallucinations.** No framework can.<br>
β€’ ❌ **It does not override model randomness.** The model is still probabilistic.<br>
β€’ ❌ **It does not replace evaluation pipelines.** Test, measure, verify β€” always.

**What it does do:**

β€’ βœ… Lower output variance
β€’ βœ… Improve traceability
β€’ βœ… Make debugging less painful
β€’ βœ… Encourage architectural thinking over prompt tweaking

---

## The Bigger Shift

We've been treating LLM interaction like **conversation**.

But production systems aren't conversations.<br>
They're **stateful processes**.

If we start treating prompts as **execution graphs** instead of paragraphs, reliability improves β€” not because the model changed, but because **the design did**.

---

## I Published the Full Framework

I've documented this approach in:

β€’ πŸŽ“ **A free 12-class video course** β€” From fundamentals to debugging<br>
β€’ πŸ“š **Complete GitHub documentation** β€” Code, examples, and patterns<br>
β€’ πŸ“œ **A manifesto** β€” The engineering philosophy behind the framework<br>

πŸ”— **Repository:** https://github.com/mindhack03d/SymbolicPrompting
πŸŽ₯ **Course:** https://youtube.com/playlist?list=PLNFL-2KY9QZVqoRwRzVLPN6qmDftpsjg6

---

## πŸ’¬ Open Questions for Builders
I'm especially interested in feedback from engineers working with production LLM systems:

### πŸ”΄ Where would symbolic constraint break?
Long contexts? Creative tasks? Open-ended generation?

### πŸ“Š How far can entropy reduction realistically go?
What's the theoretical limit of variance reduction?

### 🧠 Does structured prompting scale across models?
GPT-4 vs. Gemini vs. local modelsβ€”are some more receptive?

### βš™οΈ How does this compare to programmatic wrappers + tool calling?
What are the real tradeoffs in production?

If you're building serious LLM systems, **I'd genuinely like to hear your experience**.

πŸ‘‡ **Drop a comment below** β€” especially if you've tried something similar and it failed. I want to hear about that too.

---

*Thoughts? Questions? Disagreements?* **Drop a comment below**. I'm especially interested in hearing where this approach has failed for othersβ€”that's how we improve.

Let's move from "**prompt engineering as clever wording**" to "**prompt engineering as system architecture.**"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)