DEV Community

Cover image for From Prompting to Programming: Making LLM Outputs More Predictable with Structure
Jesus Huerta Martinez
Jesus Huerta Martinez

Posted on

From Prompting to Programming: Making LLM Outputs More Predictable with Structure

Based on the open-source Symbolic Prompting framework. All benchmarks, datasets, and workflows are publicly available for verification.


The Problem

Most interactions with LLMs today look like this:

I have a user who is 17 years old. Can they vote?
Please analyze their age and tell me if they meet the requirement.
Enter fullscreen mode Exit fullscreen mode

And the output is often something like:

“It depends on the country…”
Enter fullscreen mode Exit fullscreen mode

This isn’t wrong — but it’s not predictable.

The model is interpreting intent, filling gaps, and defaulting to conversational behavior.


A Different Approach: Treat Prompts as Logic

Instead of asking, we can structure the prompt more like a program:

[ROLE] ::= Age_Validator
$age := 17
IF $age >= 18 THEN
  _result := "APROVED"
ELSE
  _result := "REFUSED"
ENDIF
[CONSTRAINTS] { NO_ADD_COMMENTS_OR_PROSE, ONLY_PRINT_VALUE }
[OUTPUT] ::= _result
Enter fullscreen mode Exit fullscreen mode

Observed result (multiple runs):

REFUSED
Enter fullscreen mode Exit fullscreen mode

Same input → same output pattern.


Quick Repro (Copy/Paste Test)

You can test the difference yourself:
1. Natural language prompt
• Run it 5–10 times
• Slight variations in wording or reasoning may appear
2. Structured prompt (above)
• Run it 5–10 times
• Output remains stable in most cases
This isn’t true determinism — but it reduces variance significantly.


What’s Happening Under the Hood?

LLMs are still probabilistic systems. This approach doesn’t change that.
What structured prompting does:
• Reduces ambiguity
• Narrows the model’s response space
• Encourages consistent token paths
In practice, this often leads to more stable outputs, especially in simple decision logic.


Benchmarks (Summary)

I ran ~300 tests across multiple models and prompt formats:
• Natural language prompts
• JSON/DSL structured inputs
• Symbolic prompting (logic-like syntax)

Observation:
• Output consistency and latency varied significantly depending on format
• In some cases, differences reached ~30–40% between formats on the same model

Important:
• Some models appear optimized for JSON-style inputs
• Token count alone does not explain performance differences

Full data + methodology: 👉 https://github.com/mindhack03d/SymbolicPrompting


When This Approach Works Well

Structured prompting is particularly useful for:
• Validation logic (age, permissions, thresholds)
• Routing decisions
• Pre-processing steps in pipelines
• Deterministic-like workflows


When It Doesn’t

This approach is not ideal for:
• Creative writing
• Open-ended reasoning
• Brainstorming tasks
• Ambiguity-driven exploration
In those cases, conversational prompting is still more effective.

--

Common Pitfalls

Mixing natural language inside logic

❌ IF age is greater than 18 THEN
✅ IF age >= 18 THEN
Enter fullscreen mode Exit fullscreen mode

Silent error handling

❌ [CATCH] => { }
Enter fullscreen mode Exit fullscreen mode

Always surface or log errors when possible.


“Magic” prompts you don’t understand

If a structure works but you can’t explain why, it’s fragile.


Key Takeaways

• LLMs don’t become deterministic — but they can become more predictable
• Structure reduces ambiguity
• Prompt design benefits from software engineering principles


Final Thought

Most people interact with LLMs conversationally by default.
But if you're building systems — not just asking questions —
it may be useful to think less in terms of prompts, and more in terms of interfaces and logic.


Resources

• Repo (benchmarks, workflows, datasets):
https://github.com/mindhack03d/SymbolicPrompting


If you experiment with this approach, I’d be interested to hear what works (and what doesn’t) in your use case.

Top comments (0)