DEV Community: GoRealAi

Why we built a programming language for AI prompts instead of another GUI

GoRealAi — Wed, 01 Apr 2026 13:57:38 +0000

The first version of our AI product had this in the codebase:

system_prompt = f"""
You are a customer support agent for {company}.
{premium_instructions if tier == 'premium' else free_instructions}
{billing_policy if issue_type in ['billing', 'refund'] else ''}
...12 more conditional blocks...
"""

This works until it doesn't. By month three we had:

4,000-token prompts being sent unconditionally
Conditional logic scattered across Python files, config files, and Notion docs
No way to test a prompt change without deploying the app
No version history - just vibes and git blame

We looked at every existing tool. They all had one thing in common: they stored prompts as strings with variable substitution. That doesn't solve the problem. It just moves the string somewhere else.

The actual problem: prompts need real conditional logic

The LLM doesn't need to see instructions for premium users when the current user is free. It doesn't need the billing policy when the question is technical. Every irrelevant section is tokens the model has to cognitively weight around - and we're paying for all of it.

We built Echo PDK - a DSL that evaluates if/else logic server-side before the prompt ever reaches the model.

[#ROLE system]
You are a support agent for {{company}}.

[#IF {{tier}} #equals(Premium)]
Priority customer. Offer callback, skip escalation.
[END IF]
[END ROLE]

[#IF {{issue}} #one_of(billing, refund)]
[#INCLUDE billing_policies]
[END IF]

[#ROLE user]
{{question}}
[END ROLE]

The render engine evaluates the conditionals, substitutes variables, and returns the resolved messages array. The LLM sees only the output - never the template logic.

What we learned building a DSL from scratch

The parser is built on Chevrotain. We spent a month on the operator design - the tension between readable names (#contains, #greater_than) for non-engineers and short aliases (#gt, #in) for developers. We ship both.

The hardest design decision was the #ai_gate operator - an LLM-evaluated condition. Useful for "if the user's message is angry, include de-escalation instructions" but adds latency and cost. We ship it but recommend using it sparingly.

The meta template feature - where model selection and temperature can be conditional - was an accident. We were refactoring and realized there was no reason model config had to be hardcoded in application code. Now the prompt decides: creative tasks get gpt-4o at 0.9, extraction gets gpt-4o-mini at 0.2.

Token reduction in practice

In production, conditional rendering dropped our average input tokens from ~4,100 to ~1,200 across all request types. The quality improvement was unexpected but real - smaller models especially seem to benefit from receiving only relevant context.

Open source

Echo PDK is MIT licensed: github.com/GoReal-AI/echo-pdk. The plugin system lets you add custom operators - #isValidEmail, #isEmpty, domain-specific validators. We built a hosted layer (EchoStash) on top for version control and evals, but the DSL works fully standalone.

What's your approach to prompt management in production? Curious whether others have hit the same walls.

Prompt Version Control: 4 Approaches Compared

GoRealAi — Thu, 26 Mar 2026 18:00:04 +0000

How do you version your prompts? We compared the four most common approaches used by production AI teams.

1. Git-Native

Store prompts as files in your repo. Free, familiar.

Pros: Works with existing CI/CD, no new tools.
Cons: No prompt-specific testing, diffs are useless for long prompts.

2. Dedicated Platforms

Purpose-built tools with versioning, evals, and deployment.

Pros: Prompt-specific features, eval frameworks, rollback.
Cons: Another tool in the stack.

3. Hybrid

Code in Git, prompts in a dedicated system, connected via SDK.

Pros: Decoupled deploys - prompt changes don't require code deploys.
Cons: Integration work upfront.

4. Feature Flags

Treat prompt versions like feature flags with A/B testing and gradual rollouts.

Pros: Most control, production testing.
Cons: Complexity overhead.

Teams using dedicated prompt versioning report 60% fewer prompt-related incidents.

Originally published at [echostash.app/blog/prompt-version-control-comparing-approaches](https://dub.sh/1114Z7j

Prompt Sprawl: What the Real Costs Look Like in Production

GoRealAi — Wed, 25 Mar 2026 16:00:10 +0000

Prompt sprawl is the hidden tax on every AI team. Prompts scattered across Notion, GitHub issues, Slack threads, and hardcoded strings means nobody knows which version is running in production.

The Real Numbers

Teams we've talked to report:

3-5 hours/week per engineer just finding and reconciling prompt versions
$50K+/year in wasted compute from running outdated or duplicate prompts
2-3 day average debugging time when a prompt regression hits production

Why It Happens

Prompts start small. A string in your code. A note in Notion. Then the team grows, models change, and suddenly you have 200+ prompts with no single source of truth.

What Actually Fixes It

Centralize - one place for all prompts, searchable and versioned
Version - every edit tracked, diffable, rollback-ready
Test - automated evals that run before prompt changes hit production
Deploy - environments (dev/staging/prod) for prompts, not just code

Originally published at [echostash.app/blog/prompt-sprawl-cost-production-llm-teams](https://dub.sh/NTsuYbU

Multi-Agent AI: What It Means for Prompt Management

GoRealAi — Tue, 24 Mar 2026 19:53:05 +0000

Multi-agent AI systems are changing how we think about prompts. When you have an orchestrator delegating to specialist agents, each with its own optimized prompt, prompt management becomes a dependency graph, not a flat file.

The Problem

When Agent A's prompt changes, it can break Agent B's expectations downstream. Teams are losing days debugging issues that trace back to a single prompt edit in a sub-agent.

What Production Teams Are Doing

Versioning each agent's prompt independently - not in the same commit as code changes
Testing prompt combinations - individual prompt tests aren't enough when agents interact
Monitoring prompt drift across the entire agent graph
Rolling back individual agent prompts without redeploying the whole system

The shift from 'prompt engineering' to 'prompt infrastructure' is accelerating.

Originally published at echostash.app/blog/multi-agent-systems-prompt-management