If you’ve been building with LLMs lately, you probably know the pattern.
You start with a simple system prompt.
Then the product grows.
Then the prompt becomes longer.
Then you add rules.
Then you add exceptions.
Then you add examples.
Then you add “never do this” instructions.
Soon, your entire production logic is sitting inside a 2,000-word system prompt and you’re hoping the model follows it correctly every time.
That works well enough for demos.
But production is different.
Production has messy users, pricing rules, tool calls, memory, business policies, edge cases, latency issues, and cost pressure.
This is where I think system prompting becomes a single point of failure.
The industry often calls this “guardrails.”
But in many cases, we are still just asking the model to please behave.
I’m building NEES Core Engine because I believe AI products need to move from soft prompts to hard runtimes.
Not because prompts are useless.
Prompts are important.
But prompts alone should not be responsible for enforcing business logic, memory boundaries, escalation rules, cost control, and traceability in production AI systems.
What is Agent Drift?
In development, your AI agent feels predictable.
In production, it can start drifting.
I call this Agent Drift.
Agent Drift is when an AI system slowly moves away from the product’s intended behavior, business rules, safety boundaries, or workflow logic during real-world usage.
It is not always a dramatic hallucination.
Sometimes the output sounds reasonable.
But underneath, the agent may have skipped a rule, used the wrong context, interpreted intent incorrectly, or made a decision your product never approved.
Common symptoms:
1. Intent leakage
A user asks a hypothetical question, but the agent treats it like an instruction.
Example:
“What if you gave me a 50% discount?”
A weak agent may start negotiating or offering pricing that was never allowed.
2. Policy bypass
The system prompt says:
“Never offer more than 15% discount.”
But the user applies pressure, adds context, or phrases the request creatively, and the model still produces an unauthorized offer.
3. Memory bloat
The context window fills with old, messy, or irrelevant user history.
The agent starts making decisions based on stale memory instead of current business logic.
4. Traceability gaps
An agent makes a mistake.
The team checks the logs.
The logs show the input and output, but not the actual reasoning path:
- Which policy applied?
- Which boundary was checked?
- Why was this response allowed?
- Should this have been escalated?
- Was memory used safely?
Without traceability, debugging AI behavior becomes guesswork.
5. The LLM tax
Your product keeps paying for repeated model calls for answers that are already known, safe, and reusable.
Not every user request needs a fresh expensive model call.
Some answers should come from governed knowledge, deterministic logic, or a safe cache.
The architecture problem
Most AI apps follow this pattern:
App → Model → Output
The issue is simple:
If the model drifts, the product drifts.
If the model ignores a business rule, the product exposes that failure.
If the model produces an unsupported answer, the user sees it.
If the model makes a decision, the team often has limited visibility into why it happened.
That is why I’m exploring a different pattern:
App → Governance Runtime → Model Provider → Governed Response
This is the architecture behind NEES Core Engine.
The goal is not to replace OpenAI, Anthropic, Google, LangChain, CrewAI, Ollama, or any framework.
The goal is to add a runtime governance layer between the application and the model provider.
Think of it like a traffic-control layer for AI behavior.
The model still generates intelligence.
But the runtime governs how that intelligence is requested, checked, constrained, traced, and delivered.
Conceptual flow with NEES
Here is a simplified example of what a governed AI call could look like:
// Conceptual flow with NEES Core Engine
const response = await nees.execute({
input: userInput,
policy: "strict_pricing_v2",
boundaries: {
max_discount: 0.15,
allow_refunds: false,
require_escalation_for_enterprise_contracts: true
},
memory: {
scope: "current_customer_session",
allow_sensitive_profile_recall: false
},
fallback: {
strategy: "local_or_deterministic",
provider: "ollama"
},
trace: true
});
This is not about making the prompt longer.
It is about moving critical product logic out of the soft prompt and into a runtime layer that can validate, route, block, fallback, cache, and trace behavior.
Why runtime governance instead of only prompt engineering?
Prompt engineering is still useful.
But prompts are probabilistic.
Production rules often need something stronger.
A governance runtime can help with:
1. Pre-execution intent checks
Before spending tokens or allowing a workflow path, the runtime can classify what the user is trying to do.
Is this a normal question?
A pricing request?
A refund request?
A tool/action request?
A sensitive memory request?
A policy violation attempt?
If the intent violates policy, the request can be blocked, modified, clarified, or escalated before the model response reaches the user.
2. Policy enforcement
Instead of relying only on:
“Please don’t offer more than 15% discount.”
The runtime can enforce:
{
"policy": "strict_pricing_v2",
"max_discount": 0.15,
"requires_manager_approval_above": 0.10
}
The model can still help communicate.
But the runtime owns the business boundary.
3. Deterministic routing
Not every request should go to the same model.
Some intents may need:
- a deterministic response
- a local knowledge base
- a smaller model
- a local model
- a human escalation
- a full reasoning model
- a blocked response
Runtime governance makes routing part of the system design, not just a prompt instruction.
4. Memory boundaries
AI memory is powerful, but risky.
A production AI system should know:
- what memory can be used
- what memory must be ignored
- what memory is user-specific
- what memory is product-level
- what memory requires consent
- what memory should never be stored
Without governance, memory can become an invisible source of drift.
5. Traceable decisions
For production AI, logs should show more than input/output.
A useful trace should explain:
- detected intent
- applied policy
- risk level
- memory usage
- routing decision
- fallback decision
- allowed/blocked/escalated status
- final governed response
This makes debugging AI behavior much easier.
6. Cost and latency control
Repeated AI calls become expensive quickly.
If a request is safe, common, verified, and not user-private, the runtime can serve it from governed knowledge or cache instead of calling a large model again.
That means governance is not only about safety.
It is also about cost control.
7. Local-first fallback
Cloud model providers can fail, slow down, rate-limit, or become expensive.
For some workflows, local fallback can keep the product stable.
A governance runtime can decide:
- when to use cloud
- when to use local
- when to use deterministic logic
- when to fallback
- when to escalate
- when not to answer
This matters more as AI moves deeper into production workflows.
Guardrails vs Runtime Governance
Here is how I think about the difference:
| Guardrails | Runtime Governance |
|---|---|
| Often output-level | Execution/runtime-level |
| Mostly reactive | More proactive |
| Prompt-dependent | Policy/runtime-driven |
| Generic safety focus | Product-specific behavior control |
| Limited traceability | Traceable decision path |
| Filters bad outputs | Governs the flow before output |
| Usually model-adjacent | App-model infrastructure layer |
Guardrails are useful.
But for production AI agents, I think they are only one part of the system.
What I’m building
I’m building NEES Core Engine as a runtime governance layer for AI apps and agents.
The current focus is:
- intent checks
- policy enforcement
- memory boundaries
- mode/context control
- traceable responses
- escalation logic
- governed fallback behavior
- cost governance for repeated requests
- production-oriented AI behavior control
The basic idea:
User → App → NEES Core Engine → Model Provider → Governed Response
NEES does not try to be the model.
It tries to govern the model’s role inside a real product.
I’m looking for feedback from developers
I’ve opened a developer preview of the engine.
I’m not trying to sell a subscription here.
I’m looking for engineers, AI SaaS founders, and agent builders who are tired of putting too much production logic inside prompts.
I’d love honest feedback on these questions:
- How are you currently handling Agent Drift in production?
- Are you using prompts, guardrails, custom middleware, evals, or your own runtime checks?
- Do you prefer black-box guardrails or a transparent governance layer?
- Is local-first fallback important for your AI stack in 2026?
- Would traceable AI decisions help your debugging or customer trust?
- Are repeated LLM calls becoming a real cost problem for your product?
Project links:
GitHub Developer Preview:
https://github.com/NEES-Anna/nees-core-developer-preview
Live Sample App:
https://naina.nees.cloud
I’m especially looking to learn from real production stories.
Where did your AI agent drift?
What failed?
What did you build to control it?
And do you think runtime governance is becoming a real missing layer for production AI?
Top comments (1)
For context, NEES Core Engine is still evolving as a developer preview.
I’m not claiming runtime governance solves every AI reliability problem.
The question I’m exploring is whether production AI needs a dedicated governance layer between the app and the model provider — especially for policy enforcement, memory boundaries, traceability, fallback behavior, and cost control.
Would love technical criticism from builders who have shipped AI agents beyond demo stage.