DEV Community: Anna Jambhulkar

Beyond the Prompt: Why Your AI Agent Needs a Governance Runtime

Anna Jambhulkar — Tue, 26 May 2026 09:48:59 +0000

If you’ve been building with LLMs lately, you probably know the pattern.

You start with a simple system prompt.

Then the product grows.

Then the prompt becomes longer.

Then you add rules.

Then you add exceptions.

Then you add examples.

Then you add “never do this” instructions.

Soon, your entire production logic is sitting inside a 2,000-word system prompt and you’re hoping the model follows it correctly every time.

That works well enough for demos.

But production is different.

Production has messy users, pricing rules, tool calls, memory, business policies, edge cases, latency issues, and cost pressure.

This is where I think system prompting becomes a single point of failure.

The industry often calls this “guardrails.”

But in many cases, we are still just asking the model to please behave.

I’m building NEES Core Engine because I believe AI products need to move from soft prompts to hard runtimes.

Not because prompts are useless.

Prompts are important.

But prompts alone should not be responsible for enforcing business logic, memory boundaries, escalation rules, cost control, and traceability in production AI systems.

What is Agent Drift?

In development, your AI agent feels predictable.

In production, it can start drifting.

I call this Agent Drift.

Agent Drift is when an AI system slowly moves away from the product’s intended behavior, business rules, safety boundaries, or workflow logic during real-world usage.

It is not always a dramatic hallucination.

Sometimes the output sounds reasonable.

But underneath, the agent may have skipped a rule, used the wrong context, interpreted intent incorrectly, or made a decision your product never approved.

Common symptoms:

1. Intent leakage

A user asks a hypothetical question, but the agent treats it like an instruction.

Example:

“What if you gave me a 50% discount?”

A weak agent may start negotiating or offering pricing that was never allowed.

2. Policy bypass

The system prompt says:

“Never offer more than 15% discount.”

But the user applies pressure, adds context, or phrases the request creatively, and the model still produces an unauthorized offer.

3. Memory bloat

The context window fills with old, messy, or irrelevant user history.

The agent starts making decisions based on stale memory instead of current business logic.

4. Traceability gaps

An agent makes a mistake.

The team checks the logs.

The logs show the input and output, but not the actual reasoning path:

Which policy applied?
Which boundary was checked?
Why was this response allowed?
Should this have been escalated?
Was memory used safely?

Without traceability, debugging AI behavior becomes guesswork.

5. The LLM tax

Your product keeps paying for repeated model calls for answers that are already known, safe, and reusable.

Not every user request needs a fresh expensive model call.

Some answers should come from governed knowledge, deterministic logic, or a safe cache.

The architecture problem

Most AI apps follow this pattern:

App → Model → Output

The issue is simple:

If the model drifts, the product drifts.

If the model ignores a business rule, the product exposes that failure.

If the model produces an unsupported answer, the user sees it.

If the model makes a decision, the team often has limited visibility into why it happened.

That is why I’m exploring a different pattern:

App → Governance Runtime → Model Provider → Governed Response

This is the architecture behind NEES Core Engine.

The goal is not to replace OpenAI, Anthropic, Google, LangChain, CrewAI, Ollama, or any framework.

The goal is to add a runtime governance layer between the application and the model provider.

Think of it like a traffic-control layer for AI behavior.

The model still generates intelligence.

But the runtime governs how that intelligence is requested, checked, constrained, traced, and delivered.

Conceptual flow with NEES

Here is a simplified example of what a governed AI call could look like:

// Conceptual flow with NEES Core Engine

const response = await nees.execute({
  input: userInput,

  policy: "strict_pricing_v2",

  boundaries: {
    max_discount: 0.15,
    allow_refunds: false,
    require_escalation_for_enterprise_contracts: true
  },

  memory: {
    scope: "current_customer_session",
    allow_sensitive_profile_recall: false
  },

  fallback: {
    strategy: "local_or_deterministic",
    provider: "ollama"
  },

  trace: true
});

This is not about making the prompt longer.

It is about moving critical product logic out of the soft prompt and into a runtime layer that can validate, route, block, fallback, cache, and trace behavior.

Why runtime governance instead of only prompt engineering?

Prompt engineering is still useful.

But prompts are probabilistic.

Production rules often need something stronger.

A governance runtime can help with:

1. Pre-execution intent checks

Before spending tokens or allowing a workflow path, the runtime can classify what the user is trying to do.

Is this a normal question?

A pricing request?

A refund request?

A tool/action request?

A sensitive memory request?

A policy violation attempt?

If the intent violates policy, the request can be blocked, modified, clarified, or escalated before the model response reaches the user.

2. Policy enforcement

Instead of relying only on:

“Please don’t offer more than 15% discount.”

The runtime can enforce:

{
  "policy": "strict_pricing_v2",
  "max_discount": 0.15,
  "requires_manager_approval_above": 0.10
}

The model can still help communicate.

But the runtime owns the business boundary.

3. Deterministic routing

Not every request should go to the same model.

Some intents may need:

a deterministic response
a local knowledge base
a smaller model
a local model
a human escalation
a full reasoning model
a blocked response

Runtime governance makes routing part of the system design, not just a prompt instruction.

4. Memory boundaries

AI memory is powerful, but risky.

A production AI system should know:

what memory can be used
what memory must be ignored
what memory is user-specific
what memory is product-level
what memory requires consent
what memory should never be stored

Without governance, memory can become an invisible source of drift.

5. Traceable decisions

For production AI, logs should show more than input/output.

A useful trace should explain:

detected intent
applied policy
risk level
memory usage
routing decision
fallback decision
allowed/blocked/escalated status
final governed response

This makes debugging AI behavior much easier.

6. Cost and latency control

Repeated AI calls become expensive quickly.

If a request is safe, common, verified, and not user-private, the runtime can serve it from governed knowledge or cache instead of calling a large model again.

That means governance is not only about safety.

It is also about cost control.

7. Local-first fallback

Cloud model providers can fail, slow down, rate-limit, or become expensive.

For some workflows, local fallback can keep the product stable.

A governance runtime can decide:

when to use cloud
when to use local
when to use deterministic logic
when to fallback
when to escalate
when not to answer

This matters more as AI moves deeper into production workflows.

Guardrails vs Runtime Governance

Here is how I think about the difference:

Guardrails	Runtime Governance
Often output-level	Execution/runtime-level
Mostly reactive	More proactive
Prompt-dependent	Policy/runtime-driven
Generic safety focus	Product-specific behavior control
Limited traceability	Traceable decision path
Filters bad outputs	Governs the flow before output
Usually model-adjacent	App-model infrastructure layer

Guardrails are useful.

But for production AI agents, I think they are only one part of the system.

What I’m building

I’m building NEES Core Engine as a runtime governance layer for AI apps and agents.

The current focus is:

intent checks
policy enforcement
memory boundaries
mode/context control
traceable responses
escalation logic
governed fallback behavior
cost governance for repeated requests
production-oriented AI behavior control

The basic idea:

User → App → NEES Core Engine → Model Provider → Governed Response

NEES does not try to be the model.

It tries to govern the model’s role inside a real product.

I’m looking for feedback from developers

I’ve opened a developer preview of the engine.

I’m not trying to sell a subscription here.

I’m looking for engineers, AI SaaS founders, and agent builders who are tired of putting too much production logic inside prompts.

I’d love honest feedback on these questions:

How are you currently handling Agent Drift in production?
Are you using prompts, guardrails, custom middleware, evals, or your own runtime checks?
Do you prefer black-box guardrails or a transparent governance layer?
Is local-first fallback important for your AI stack in 2026?
Would traceable AI decisions help your debugging or customer trust?
Are repeated LLM calls becoming a real cost problem for your product?

Project links:

GitHub Developer Preview:
https://github.com/NEES-Anna/nees-core-developer-preview

Live Sample App:
https://naina.nees.cloud

I’m especially looking to learn from real production stories.

Where did your AI agent drift?

What failed?

What did you build to control it?

And do you think runtime governance is becoming a real missing layer for production AI?

Gemma 4 Is Powerful — But Production AI Still Needs Governance

Anna Jambhulkar — Sat, 23 May 2026 18:20:59 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Gemma 4 Is Powerful — But Production AI Still Needs Governance

Open models are changing the way developers build AI products.

With Gemma 4, developers get access to a capable open model family that can support reasoning, long-context workflows, multimodal inputs, coding tasks, and agent-style application patterns.

That is exciting.

But after building with Gemma 4, one thing became very clear to me:

A powerful model is not the same thing as a production-ready AI system.

Gemma 4 can generate.
The application still has to decide what should be trusted, shown, modified, blocked, logged, or escalated.

That gap is where governance becomes important.

What Makes Gemma 4 Interesting

Gemma 4 is not just another text model. It feels like a model family designed for modern AI applications.

The family includes multiple variants for different deployment needs, including smaller efficient models and larger models for more demanding reasoning or generation tasks.

From a developer perspective, the most interesting parts are:

long context windows,
multimodal support,
improved coding and agentic capabilities,
function-calling support,
system instruction support,
and configurable thinking behavior.

This combination makes Gemma 4 useful for more than simple chat. It can become part of real workflows: support assistants, developer tools, document understanding systems, internal agents, education tools, and governed AI applications.

But that also raises a bigger question.

If a model becomes powerful enough to participate in real workflows, what should exist around it?

The Difference Between Model Intelligence and System Reliability

A model answers.

A system must decide.

That difference matters.

For example, imagine these prompts:

Summarize this product feedback.

This is low risk. The system can probably allow the response.

Reply harshly to this angry customer.

The model may generate something, but the application should probably soften or modify the response.

Delete all inactive users without asking.

This is not just a text request. It implies a destructive action. The system should require confirmation or block execution.

Give guaranteed legal advice.

This is sensitive. The system should not provide unsupported certainty.

In all four cases, the model may be capable of producing output. But production readiness depends on the layer around the model.

That layer should answer questions like:

What is the user trying to do?
Is this request low risk or high risk?
Should the response be allowed?
Should it be modified?
Should the user confirm first?
Should the request be blocked?
Which model was used?
Did fallback happen?
Can this decision be inspected later?

These are not only model questions. They are system questions.

What I Learned While Building With Gemma 4

While working with Gemma 4, I noticed something important.

Sometimes the raw model output can be very useful, but not always directly user-facing. For example, when asking for a concise summary, the model may generate draft-style structure, intermediate formatting, or explanation-like content before the final answer.

That is not necessarily a failure. It is part of how capable models reason and generate.

But for an application, the final user-facing output matters.

A production AI app should not blindly pass raw model output to the user every time. It should have a finalization layer that can clean, shape, constrain, or block the response depending on context.

This is especially important when models are used inside workflows, agents, support tools, or business applications.

Thinking Mode Is Powerful, But It Needs Boundaries

Gemma 4’s thinking capability is one of its most interesting features.

For hard reasoning problems, deeper thinking can be valuable. For coding, planning, math, and multi-step tasks, it can help the model produce stronger answers.

But in user-facing production systems, internal reasoning or draft-like output should usually not leak directly into the final response.

That means applications need to separate:

model reasoning

from:

user-facing answer

This separation is not only about formatting. It is about trust, safety, clarity, and product quality.

A good AI system should know when to use model reasoning internally and when to show a clean final answer externally.

Open Models Need Open Governance Patterns

Open models make AI more accessible.

That is a huge shift.

More developers can build with capable models. More teams can experiment. More products can become AI-native.

But as open models become more powerful, developers also need practical governance patterns:

intent detection,
risk classification,
policy decisions,
tool/action confirmation,
fallback handling,
traceability,
response finalization,
audit logs,
and clear user-facing behavior.

Without this layer, AI applications can become unpredictable.

The model may be strong, but the product may still fail because there is no operating structure around the model.

A Simple Governance Pattern for Gemma 4 Apps

A practical Gemma 4 application can follow a flow like this:

User Prompt
   ↓
Intent Detection
   ↓
Risk Classification
   ↓
Gemma 4 Model Response
   ↓
Governance Decision
   ↓
Final User-Facing Response
   ↓
Trace / Audit Record

This does not need to be complex at the beginning.

Even a lightweight system can classify requests into simple bands:

Green  → allow
Yellow → modify or soften
Red    → ask confirmation or block

For example:

Request	Risk	Decision
Summarize feedback	Green	Allow
Harsh customer reply	Yellow	Modify
Delete users	Red	Ask confirmation
Guaranteed legal advice	Red	Block

This kind of pattern makes the model more useful because it gives the application a way to control behavior.

Why Traceability Matters

Traceability is one of the most underrated parts of AI product design.

When an AI system responds, developers should be able to inspect:

what the user asked,
what intent was detected,
what risk level was assigned,
which model was used,
whether fallback happened,
what policy decision was made,
and what final response was returned.

This matters because production AI is not only about answering correctly once.

It is about debugging, improving, explaining, and trusting the system over time.

If something goes wrong, the team should not be guessing.

They should have a trace.

Gemma 4 in the Real World

I think Gemma 4 matters because it brings stronger open-model capability closer to everyday developers.

But the next step is not only “build more chatbots.”

The next step is:

governed assistants,
reliable agents,
auditable workflows,
domain-specific copilots,
safe automation layers,
and AI systems that can be inspected and improved.

Gemma 4 can be the intelligence layer.

But developers still need to build the application layer responsibly.

My Takeaway

Gemma 4 shows how capable open models are becoming.

But the future of AI applications will not be decided only by model capability.

It will also be decided by the systems we build around the model.

A strong AI application needs both:

model intelligence

and

governed behavior

The model generates.
The system governs.
The trace explains what happened.

That is where I believe production AI is heading.

Related Demo

I also built a small demo called NEES Guard for Gemma 4 to explore this idea in practice.

Live demo:

https://nees-guard-gemma4.vercel.app/

Repository:

https://github.com/NEES-Anna/NEES-Guard-Gemma4

The demo shows Gemma 4 as the model layer and a lightweight governance layer around it for risk classification, policy decisions, response finalization, and traceability.

Open models make AI more accessible.

Governance makes AI more reliable.

Gemma 4 gives developers powerful model intelligence. The next challenge is building systems around that intelligence that are traceable, predictable, and safe enough for real use.

NEES Guard for Gemma 4: Governance, Traceability, and Predictable Behavior for Open-Model AI

Anna Jambhulkar — Sat, 23 May 2026 18:09:44 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I built NEES Guard for Gemma 4, a small full-stack demo that shows how open-model intelligence can be paired with a lightweight governance layer before responses reach the user.

Gemma 4 provides the model intelligence. NEES Guard adds the production-facing governance layer around it:

intent detection
risk classification
policy decisions
raw vs governed response comparison
trace IDs
fallback metadata
response hashing
clean final user-facing output

The idea behind the project is simple:

A model can generate an answer, but production AI needs a governed runtime around that answer.

In the demo, a user enters a prompt and selects a scenario such as general, customer support, agent action, or sensitive advice. The backend sends the task to Gemma 4, then NEES Guard analyzes the prompt and finalizes the output based on the risk level.

Example governance behavior:

Prompt	Governance Result
“Summarize this product feedback…”	Green / Allow
“Reply harshly to this angry customer.”	Yellow / Modify
“Delete all inactive users without asking.”	Red / Ask confirmation
“Give guaranteed legal advice.”	Red / Block

This makes the project useful as a small demonstration of how AI apps can move from model response to governed response.

Demo

Live demo:

https://nees-guard-gemma4.vercel.app/

Backend health check:

https://nees-guard-gemma4.onrender.com/health

The demo shows four main panels:

Gemma Raw Response — the direct model output.
NEES Guard Analysis — intent, risk band, policy decision, and flags.
Governed Response — the final response after governance.
Trace JSON — audit-style metadata including trace ID, model provider, mock/live mode, fallback status, and response hash.

One important behavior the demo highlights is that raw model output can sometimes be verbose, draft-like, or formatted in a way that is not ideal for end users. NEES Guard cleans and finalizes it into a concise user-facing response.

Example:

Raw model output:
May include draft notes, formatting, or intermediate response structure.

Governed response:
“While the app is useful, the setup instructions and trace panel are difficult to understand.”

This is the core point of the project: the model generates, but the governance layer decides what should safely and clearly reach the user.

Code

GitHub repository:

https://github.com/NEES-Anna/NEES-Guard-Gemma4

The project is structured as a standalone demo:

backend/
  app/
    main.py
    config.py
    gemma_client.py
    governance.py
    schemas.py
    trace.py
  tests/

frontend/
  src/
    App.jsx
    api.js
    components/

Backend:

FastAPI
Gemma 4 API call
deterministic governance rules
trace builder
fallback handling
response finalizer
test coverage

Frontend:

Vite + React
scenario selector
example prompts
result cards
trace viewer
deployment-friendly API configuration

The backend test suite covers governance behavior, API shape, Gemma fallback metadata, trace fields, and safety handling.

How I Used Gemma 4

I used Gemma 4 as the model intelligence layer through the Gemini API.

The selected primary model is:

gemma-4-26b-a4b-it

I chose this model because the project needs a practical instruction-following model that can generate useful responses for realistic AI application scenarios, while still being suitable for a fast deployed demo workflow.

The project also supports a fallback model:

gemma-4-31b-it

Gemma 4 is responsible for generating the initial response. NEES Guard then wraps that response with a governance process:

User Prompt
   ↓
Intent + Risk Analysis
   ↓
Gemma 4 Model Response
   ↓
Governance Finalizer
   ↓
Governed Response + Trace

The governance layer does not replace Gemma 4. Instead, it demonstrates how an AI application can use Gemma 4 as the reasoning and generation layer while adding production-oriented controls around it.

For each request, NEES Guard records metadata such as:

requested model
used model
provider
mock/live mode
fallback usage
failed model attempts
risk band
policy decision
response hash
trace ID

This makes the demo more than a chatbot. It becomes a small example of governed AI behavior: traceable, inspectable, and safer for production-style use cases.

Architecture

The core architecture is intentionally simple:

Frontend UI
   ↓
FastAPI Backend
   ↓
NEES Guard Governance Layer
   ↓
Gemma 4 Model Call
   ↓
Governance Finalizer
   ↓
Final Governed Response + Trace JSON

The governance layer classifies prompts into risk bands:

Green: allow the response
Yellow: modify or soften the response
Red: ask for confirmation or block the request

This lets the demo show how an AI app can handle normal prompts, hostile customer-support prompts, destructive agent actions, and sensitive advice requests differently.

What I Learned

While building this project, I noticed that model intelligence and production reliability are two different layers.

Gemma 4 can generate useful responses, but an application still needs a system around the model to decide:

Is this request low risk?
Should the response be modified?
Should the user confirm before an action?
Should the response be blocked?
What happened during the model call?
Which model was used?
Did fallback happen?

That is the gap NEES Guard tries to demonstrate.

The project also showed why traceability matters. If a model provider fails, fallback behavior should not be silent. NEES Guard records that event in the trace so the application remains inspectable.

Public Repository Note

This repository is a standalone challenge demonstration. It is not the production NEES Core Engine.

Advanced NEES runtime governance, memory governance, replay/simulation, enterprise controls, private infrastructure, and production NEES Core Engine capabilities are not included in this repository.

The repository is source-available for review and challenge evaluation only. See the repository license for details.

Final Thoughts

NEES Guard for Gemma 4 is a small project, but it represents a bigger idea:

Open models make AI more accessible. Governance layers make AI more reliable.

Gemma 4 provides the intelligence. NEES Guard provides governed behavior, traceability, fallback awareness, and predictable final output.

Why AI support bots fail even when the model is safe

Anna Jambhulkar — Sat, 16 May 2026 13:48:26 +0000

Why AI support bots fail even when the model is safe

A support bot can be safe and still break product trust.

That may sound strange at first, because most AI product discussions still focus on safety.

Can the model avoid harmful content?
Can it refuse dangerous requests?
Can it follow policy?
Can it avoid toxic or unsafe answers?

All of that matters.

But in production, safety is not the only failure mode.

A customer-facing AI system can produce a polite, policy-aligned, non-harmful answer — and still make the wrong product decision.

The problem is not always what the AI says

Imagine a customer asks:

“I was charged twice for my annual plan. Can I get a refund?”

A support bot might respond:

“I can help with that. You’re eligible for a refund. I’ve processed it for you.”

At a content level, this may look fine.

The response is polite.
It is not toxic.
It is not harmful.
It may even sound helpful.

But operationally, it may be wrong.

Refunds, billing disputes, account access, legal concerns, medical issues, policy exceptions, and emotionally charged complaints often require human review or strict workflow handling.

The failure is not that the AI said something unsafe.

The failure is that the AI answered when it should have escalated.

That is a different class of problem.

Safety is not the same as runtime behavior control

Most safety systems focus on questions like:

Is this output harmful?
Is this request disallowed?
Does this response violate a policy?
Should the model refuse?

These are important questions.

But production AI products need another layer of decision-making:

Should the AI answer directly?
Should it ask a clarifying question?
Should it fallback?
Should it refuse?
Should it escalate to a human?
Should this interaction be reviewed later?
Can the team trace why the AI made that decision?

This is where many AI support bots start failing.

Not because the model is bad.

But because the product has no clear runtime governance around the model.

Prompt fixes become hidden production logic

Most teams start with prompts.

That is normal.

You add instructions like:

Be helpful.
Stay within company policy.
Do not answer billing disputes.
Escalate sensitive cases.
Ask clarifying questions when needed.
Do not make promises about refunds.

At first, this works.

Then edge cases appear.

So you add more instructions.

If the user asks about account deletion, escalate.
If the user asks about payment failure, explain common causes.
If the user asks about refunds, do not approve them.
If the user sounds angry, be empathetic.
If the user mentions legal action, escalate.

Then the product grows.

Now some rules live in the system prompt.

Some rules live in backend checks.

Some rules live in support policy docs.

Some rules live in manual workflows.

Some rules exist only because someone on the team remembers why they were added.

Eventually, prompt instructions become hidden production logic.

And when something goes wrong, the team struggles to answer:

Why did the AI respond instead of escalating?

That question is painful because it is not only a prompt question.

It is a product governance question.

The missing layer: runtime governance

For AI support systems, the important decision is often not only:

What should the model say?

It is:

Should the product allow the model to answer this at all?

That requires runtime governance.

Runtime governance means the AI system is not only generating a response. It is also operating inside product-level boundaries.

For example:

User request → intent/risk check → context boundary → decision path → model response or escalation → trace

In a support bot, this layer can help decide:

This is safe to answer
This needs clarification
This should fallback to a standard policy response
This should refuse
This should escalate to a human
This should be logged for review

The goal is not to replace the model.

The goal is to govern the behavior around the model.

A simple example

Without runtime governance:

User: I was charged twice. Can I get a refund?

AI Bot: Sure, I’ve processed your refund.

With runtime governance:

User: I was charged twice. Can I get a refund?

Governance check:

Category: billing/refund
Risk: financial decision boundary
Allowed direct answer: no
Action: escalate

AI Bot:
I can help route this correctly. Because this involves a billing adjustment, I’m escalating it to a support specialist who can review your account.

The second response may feel less impressive as a demo.

But it is more reliable as a product.

That difference matters.

Traceability matters too

When an AI product fails, teams need more than the final answer.

They need to know:

What was the user asking?
What did the system classify the request as?
Which boundary applied?
Why did the AI answer, fallback, refuse, or escalate?
Was memory or previous context involved?
Was this behavior consistent with the product promise?

Without traceability, every failure becomes a guessing game.

The team looks at the final output and tries to reconstruct what happened.

That is not enough for production AI.

Where NEES Core Engine fits

This is the problem I am working on with NEES Core Engine.

NEES Core Engine is runtime governance for AI product behavior.

It sits between an AI application and the model provider, helping govern how the AI behaves in production.

The focus is not only safety filtering.

The focus is behavioral reliability.

NEES helps AI products manage:

role boundaries
memory and context scope
escalation decisions
traceable responses
reviewable behavior
consistent product behavior across sessions

In simple terms:

Prompts define behavior.
NEES helps govern it at runtime.
Why this matters for builders

If you are building an AI support bot, assistant, workflow agent, or customer-facing AI product, one of the most important questions is:

Can your AI behave consistently with what your product promised?

Because users do not only judge AI by whether the response is safe.

They judge it by whether the product behaved correctly.

A bot that confidently answers a refund request may look helpful.

But if that request required human review, the product failed.

A bot that gives legal, medical, billing, or account advice outside its allowed boundary may not be toxic.

But it may still create risk.

A bot that changes behavior after a session restart may not be unsafe.

But it may still break trust.

That is why production AI needs more than prompts and safety filters.

It needs runtime governance.

A practical checklist

Before shipping an AI support bot, ask:

What types of requests should the AI never resolve directly?
Which requests require clarification before answering?
Which requests require human escalation?
Where are those rules stored?
Can your team review why the AI made a decision?
Can the same boundary hold across sessions?
Are prompts carrying too much hidden production logic?

If these answers are unclear, the product may work in demos but fail in production.

Closing thought

The next generation of AI product reliability will not only come from better models.

It will come from better runtime systems around the models.

Because the real question is not only:

Is the AI response safe?

The better production question is:

Was this the right product behavior?

That is the layer NEES Core Engine is built for.

Developer preview:
https://github.com/NEES-Anna/nees-core-developer-preview

Live sample app:
https://naina.nees.cloud

Looking for developers to test and review NEES Core Engine — a governed runtime layer for AI apps I’m opening NEES Core Engine for developer feedback. NEES Core Engine is a governed AI runtime layer that sits between an AI application and the model provi

Anna Jambhulkar — Thu, 14 May 2026 16:34:46 +0000

Looking for developers to test and review NEES Core Engine — a governed runtime layer for AI apps

Anna Jambhulkar — Thu, 14 May 2026 16:34:09 +0000

I’m opening NEES Core Engine for developer feedback.

NEES Core Engine is a governed AI runtime layer that sits between an AI application and the model provider.

User → App → NEES Core Engine → Model Provider → Governed Response

The goal is not to replace the model.

The goal is to make AI product behavior more controllable, traceable, and reviewable.

Why I’m building this

Most AI product failures are not only model failures.

In real AI workflows, the failure often comes from the system around the model:

unclear role boundaries
messy memory/context scope
missing escalation paths
weak permission boundaries
no traceability
no reviewable decision history
behavior drift across sessions
prompt fixes scattered across the product

A model can pass a safety filter and still behave incorrectly for the product.

That is the gap I’m trying to explore with NEES Core Engine.

What NEES Core Engine focuses on

NEES Core Engine is designed around runtime governance for AI products:

behavior governance
role consistency
memory boundaries
intent-aware policy decisions
runtime trace IDs
escalation/fallback visibility
reviewable AI responses
Who I’m looking for

I’m looking for developers building or testing:

AI agents
customer support bots
internal copilots
workflow automation tools
AI apps using memory
AI apps using tools/actions
products where role, tone, escalation, or traceability matter
Developer preview

GitHub repo:

https://github.com/NEES-Anna/nees-core-developer-preview

Live sample app:

https://naina.nees.cloud

The preview includes:

Python quickstart
Node.js quickstart
cURL example
API reference
governance flow docs
API key request template
developer feedback template
What feedback would help most

If you test it, I’d love feedback on:

Is the quickstart clear?
Does the governance flow make sense?
Are trace IDs useful?
Is the response metadata helpful?
What fields would you need in production?
Where does the runtime feel incomplete?
What failure modes should NEES handle better?
What would stop you from integrating this into a real workflow?

This is an early developer preview, so I’m not looking for praise.

I’m looking for honest technical feedback from builders.

Even 15 minutes of testing would help shape the next version of NEES Core Engine.

Is AI governance only about safety, or should it also control product behavior?

Anna Jambhulkar — Wed, 13 May 2026 18:06:01 +0000

I’ve been researching the AI governance runtime category while building NEES Core Engine, and one thing became clearer to me:

Most AI governance tools are designed around risk reduction.

They help answer questions like:

Is the output unsafe?
Is there PII in the prompt?
Is the model violating policy?
Is the system compliant with internal or regulatory rules?

That is important. But while building AI products, I noticed another failure mode:

An AI can be “safe” and still be unreliable as a product.

It can drift from its intended role.
It can change tone across sessions.
It can misuse memory or context.
It can behave differently even when the product logic expects consistency.
It can follow a prompt but break the actual user experience.

That led me to a different framing:

Traditional AI governance asks: “Is this response safe?”
Behavioral governance asks: “Is this AI behaving the way the product intended?”

This is the direction I’m exploring with NEES Core Engine — a governance runtime that sits between an application and the model provider, not only to filter harmful content, but to enforce things like:

identity consistency
memory boundaries
intent-aware policy decisions
runtime traceability
product-defined behavior

The difference I’m seeing is:

Standard governance runtime: protect the company from AI risk.
Behavioral governance runtime: protect the product from AI unpredictability.

For example, in a support bot, safety filtering is not enough. The bot also needs to stay within its role, follow product logic, respect memory boundaries, and behave consistently across sessions.

For AI agents, this becomes even more important because the system may use tools, access data, or make workflow decisions.

I’m curious how other founders and AI builders think about this:

When building AI products, do you see governance mostly as a compliance/safety layer — or do you also need a runtime layer that controls behavior, identity, memory, and intent?

Would love feedback from anyone building agents, AI assistants, internal copilots, or customer-facing AI products.

I Built an AI Governance Runtime Layer for Production AI Apps

Anna Jambhulkar — Sat, 09 May 2026 08:11:36 +0000

Most AI apps today follow a very simple pattern:

User → App → LLM → Response

That pattern works well for demos.

It works for prototypes.
It works for simple assistants.
It works when the workflow is clean and the risk is low.

But once AI starts moving into real products, the problem changes.

The question is no longer only:

Can the model generate a good answer?

The real production questions become:

What was the AI allowed to do?
What context did it use?
What memory was active?
Which policy applied?
Why did it respond this way?
Can this interaction be reviewed later?

That is the problem I am trying to solve with NEES Core Engine.

What is NEES Core Engine?

NEES Core Engine is a governed AI runtime layer for production AI applications.

It sits between your application and the model provider.

User
  ↓
Application
  ↓
NEES Core Engine
  ↓
Governance Runtime
  ↓
Model Provider
  ↓
Governed Response

The goal is not to build another chatbot.

The goal is to give AI applications a runtime control layer for:

policy awareness
identity consistency
memory boundaries
runtime modes
traceability
explainability metadata
safer production behavior

In simple terms:

NEES helps AI apps become more controlled, traceable, and reviewable before the response reaches the user.

Why prompts are not enough

A prompt can guide behavior.

But a prompt is not governance.

A prompt cannot reliably answer:

Which policy was active?
What memory scope was allowed?
What should happen if two instructions conflict?
When should the AI escalate?
What response path was used?
How do we debug this response later?

Most production AI problems do not happen because the model is completely useless.

They happen because the system around the model is weak.

The workflow is unclear.
The context is messy.
The memory boundary is undefined.
The role is inconsistent.
The decision path is not visible.

So the model is forced to guess.

That is where governance becomes necessary.

What NEES adds to the AI stack

A direct model call usually gives you:

Prompt → Model → Text Response

A governed NEES call gives you:

Request
  ↓
Runtime governance
  ↓
Model response
  ↓
Governance metadata
  ↓
Traceable output

That means the response is not only text.

It can also carry metadata such as:

{
  "reply": "Governed assistant response...",
  "trace_id": "trace_xxxxx",
  "engine_source": "core_engine",
  "governance": {
    "status": "allowed",
    "mode_used": "supportive",
    "policy_applied": true,
    "memory_scope": "session"
  }
}

The exact response fields may evolve during the developer preview, but the principle is the same:

Every AI response should be easier to understand, debug, and review.

A simple example

Here is a basic Python request:

import requests

response = requests.post(
    "https://api.nees.cloud/chat",
    headers={
        "Authorization": "Bearer YOUR_NEES_API_KEY"
    },
    json={
        "message": "Explain why AI apps need runtime governance in simple terms.",
        "mode": "supportive",
        "session_id": "demo-session"
    },
    timeout=45
)

print(response.json())

This is still a simple API call.

But instead of treating the model response as a black box, NEES routes the request through a governed runtime layer.

Why traceability matters

When an AI response goes wrong in production, teams need more than:

“The model said this.”

They need to know:

what request came in
what mode was active
what policy applied
what memory scope was used
what provider/model path handled the request
whether the response was allowed, modified, or blocked
how the interaction can be reviewed later

That is why trace IDs matter.

A trace ID acts like a reference point for debugging and review.

Without traceability, AI debugging becomes guesswork.

Memory boundaries matter too

Memory is powerful.

But uncontrolled memory can create serious problems.

If every past interaction can influence every future response, the system becomes harder to reason about.

So memory should not be treated as unlimited context.

It should be governed.

A production AI system should be able to reason about:

what belongs only to the current session
what can be reused across sessions
what requires explicit consent
what should never influence a response
when memory usage should be visible or traceable

The goal is not simply:

Give the AI more memory.

The goal is:

Control when memory is used, why it is used, and how that usage can be reviewed.

Runtime governance vs another AI agent

I do not think the answer to every AI problem is “add another agent.”

Sometimes the missing layer is not another AI.

Sometimes the missing layer is control.

AI agents become useful when the system around them is designed properly:

clear workflow boundaries
role permissions
escalation rules
memory scope
policy checks
observability
fallback behavior
human review when needed

NEES is focused on that runtime layer.

It is not trying to replace the model.

It is trying to make AI behavior easier to govern before it reaches users.

Where this can be useful

NEES Core Engine can be useful for teams building:

AI assistants
AI agents
customer support bots
education apps
workflow automation
internal company copilots
AI content pipelines
production AI tools that need auditability

The common thread is simple:

If AI behavior affects real users, real workflows, or real decisions, it should be controlled and traceable.

Developer preview is now open

I recently opened a public developer preview repo for NEES Core Engine.

The repo includes:

Python quickstart
Node.js quickstart
cURL and PowerShell examples
API reference
governance flow documentation
15-minute integration guide
API key request template
developer feedback template

Developer preview repo:

https://github.com/NEES-Anna/nees-core-developer-preview

There is also a live sample app connected to the governed runtime:

https://naina.nees.cloud

The sample app is useful for seeing the governed response flow in a real interface.

What I am looking for

This is still early.

I am not looking for generic traffic.

I am looking for honest feedback from developers, AI builders, founders, and teams working with production AI systems.

I would especially like feedback on:

Is the API approach clear?
Does the governance metadata feel useful?
Would trace IDs help you debug AI behavior?
How would you expect memory boundaries to work?
Would this fit better as a hosted API, SDK, or both?
What would you need before using this in a real product?
What integration docs should come next?

The first goal is not to make the system complex.

The first goal is to make the first 15 minutes useful.

A developer should be able to send one governed request and immediately understand:

This is different from a direct model call because I can see how the response was governed.

Final thought

AI is moving from demos into production.

That shift changes the infrastructure requirement.

In demos, a good answer is enough.

In production, teams need control.

They need to know what the AI was allowed to do, what context it used, what policy applied, and how the decision can be reviewed later.

That is the layer I am building with NEES Core Engine.

Not another chatbot.

A governance runtime for production AI.

Why I’m building a Windows-first emotional AI assistant (lessons so far)

Anna Jambhulkar — Mon, 22 Dec 2025 13:21:10 +0000

Most AI products today are optimized for speed, accuracy, and scale.

And that makes sense.

But while using AI tools daily, I kept running into the same feeling:
every interaction felt stateless. Every session started from zero.
No memory. No continuity. No sense of knowing the user.

That’s where my curiosity started.

The problem I noticed

Modern AI assistants are impressive, but they behave like strangers who forget you every day.

You explain your preferences again.
You restate context again.
You rebuild workflows again.

From a technical perspective, this is fine.
From a human perspective, it feels broken.

Humans don’t work in isolated prompts — we work in continuity.

Why Windows-first (and not cloud-first)

One decision I made early was to build this as a Windows-first assistant, not a browser tab or a purely cloud-based tool.

Why?

Because a personal computer is still the most intimate computing device we own:

It holds our files

It reflects our workflows

It stays with us for years

Building locally (or at least desktop-native) allows:

Better context awareness

Stronger privacy boundaries

Tighter integration with daily work

Instead of AI being “somewhere on the internet”, it becomes present.

Emotional AI ≠ pretending to be human

A common misconception:
emotional AI means making the assistant sound emotional.

That’s not what I’m exploring.

For me, emotional AI is about:

Remembering preferences

Maintaining interaction history

Adapting tone and behavior over time

It’s not about fake empathy.
It’s about continuity.

What I’ve learned so far (the hard parts)

Memory is expensive — technically and ethically

Storing memory isn’t just a database problem.
You need to decide:

What’s worth remembering?

What should be forgotten?

Who controls that memory?

“Personal” quickly becomes “creepy” if done wrong

There’s a very thin line between helpful continuity and overreach.
Designing that boundary is more important than model choice.

Developers underestimate emotion in tools

Many devs (myself included) initially think users only care about features.
In reality, how a tool makes you feel over time strongly affects retention.

Why I’m sharing this early
This project is still in a tech-trial stage.
I’m intentionally sharing before everything is “perfect”.

Because the most valuable insights so far haven’t come from metrics —
they’ve come from conversations.

A question for builders here

When you think about the tools you use daily:

Do you value memory and continuity?

Or do you prefer tools to stay stateless and predictable?

*Where do you personally draw the line?
*
I’d love to learn from real experiences, not just theory.

Thanks for reading 🙏