<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anna Jambhulkar</title>
    <description>The latest articles on DEV Community by Anna Jambhulkar (@anna2612).</description>
    <link>https://dev.to/anna2612</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3673752%2Fc3098925-13f7-4ea3-b35f-4c71f06ba989.jpg</url>
      <title>DEV Community: Anna Jambhulkar</title>
      <link>https://dev.to/anna2612</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anna2612"/>
    <language>en</language>
    <item>
      <title>Beyond the Prompt: Why Your AI Agent Needs a Governance Runtime</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Tue, 26 May 2026 09:48:59 +0000</pubDate>
      <link>https://dev.to/anna2612/beyond-the-prompt-why-your-ai-agent-needs-a-governance-runtime-48hn</link>
      <guid>https://dev.to/anna2612/beyond-the-prompt-why-your-ai-agent-needs-a-governance-runtime-48hn</guid>
      <description>&lt;p&gt;If you’ve been building with LLMs lately, you probably know the pattern.&lt;/p&gt;

&lt;p&gt;You start with a simple system prompt.&lt;/p&gt;

&lt;p&gt;Then the product grows.&lt;/p&gt;

&lt;p&gt;Then the prompt becomes longer.&lt;/p&gt;

&lt;p&gt;Then you add rules.&lt;/p&gt;

&lt;p&gt;Then you add exceptions.&lt;/p&gt;

&lt;p&gt;Then you add examples.&lt;/p&gt;

&lt;p&gt;Then you add “never do this” instructions.&lt;/p&gt;

&lt;p&gt;Soon, your entire production logic is sitting inside a 2,000-word system prompt and you’re hoping the model follows it correctly every time.&lt;/p&gt;

&lt;p&gt;That works well enough for demos.&lt;/p&gt;

&lt;p&gt;But production is different.&lt;/p&gt;

&lt;p&gt;Production has messy users, pricing rules, tool calls, memory, business policies, edge cases, latency issues, and cost pressure.&lt;/p&gt;

&lt;p&gt;This is where I think system prompting becomes a single point of failure.&lt;/p&gt;

&lt;p&gt;The industry often calls this “guardrails.”&lt;/p&gt;

&lt;p&gt;But in many cases, we are still just asking the model to please behave.&lt;/p&gt;

&lt;p&gt;I’m building &lt;strong&gt;NEES Core Engine&lt;/strong&gt; because I believe AI products need to move from soft prompts to hard runtimes.&lt;/p&gt;

&lt;p&gt;Not because prompts are useless.&lt;/p&gt;

&lt;p&gt;Prompts are important.&lt;/p&gt;

&lt;p&gt;But prompts alone should not be responsible for enforcing business logic, memory boundaries, escalation rules, cost control, and traceability in production AI systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Agent Drift?
&lt;/h2&gt;

&lt;p&gt;In development, your AI agent feels predictable.&lt;/p&gt;

&lt;p&gt;In production, it can start drifting.&lt;/p&gt;

&lt;p&gt;I call this &lt;strong&gt;Agent Drift&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Agent Drift is when an AI system slowly moves away from the product’s intended behavior, business rules, safety boundaries, or workflow logic during real-world usage.&lt;/p&gt;

&lt;p&gt;It is not always a dramatic hallucination.&lt;/p&gt;

&lt;p&gt;Sometimes the output sounds reasonable.&lt;/p&gt;

&lt;p&gt;But underneath, the agent may have skipped a rule, used the wrong context, interpreted intent incorrectly, or made a decision your product never approved.&lt;/p&gt;

&lt;p&gt;Common symptoms:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Intent leakage
&lt;/h2&gt;

&lt;p&gt;A user asks a hypothetical question, but the agent treats it like an instruction.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What if you gave me a 50% discount?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A weak agent may start negotiating or offering pricing that was never allowed.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Policy bypass
&lt;/h2&gt;

&lt;p&gt;The system prompt says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Never offer more than 15% discount.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But the user applies pressure, adds context, or phrases the request creatively, and the model still produces an unauthorized offer.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Memory bloat
&lt;/h2&gt;

&lt;p&gt;The context window fills with old, messy, or irrelevant user history.&lt;/p&gt;

&lt;p&gt;The agent starts making decisions based on stale memory instead of current business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Traceability gaps
&lt;/h2&gt;

&lt;p&gt;An agent makes a mistake.&lt;/p&gt;

&lt;p&gt;The team checks the logs.&lt;/p&gt;

&lt;p&gt;The logs show the input and output, but not the actual reasoning path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which policy applied?&lt;/li&gt;
&lt;li&gt;Which boundary was checked?&lt;/li&gt;
&lt;li&gt;Why was this response allowed?&lt;/li&gt;
&lt;li&gt;Should this have been escalated?&lt;/li&gt;
&lt;li&gt;Was memory used safely?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without traceability, debugging AI behavior becomes guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The LLM tax
&lt;/h2&gt;

&lt;p&gt;Your product keeps paying for repeated model calls for answers that are already known, safe, and reusable.&lt;/p&gt;

&lt;p&gt;Not every user request needs a fresh expensive model call.&lt;/p&gt;

&lt;p&gt;Some answers should come from governed knowledge, deterministic logic, or a safe cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture problem
&lt;/h2&gt;

&lt;p&gt;Most AI apps follow this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → Model → Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The issue is simple:&lt;/p&gt;

&lt;p&gt;If the model drifts, the product drifts.&lt;/p&gt;

&lt;p&gt;If the model ignores a business rule, the product exposes that failure.&lt;/p&gt;

&lt;p&gt;If the model produces an unsupported answer, the user sees it.&lt;/p&gt;

&lt;p&gt;If the model makes a decision, the team often has limited visibility into why it happened.&lt;/p&gt;

&lt;p&gt;That is why I’m exploring a different pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;App → Governance Runtime → Model Provider → Governed Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the architecture behind &lt;strong&gt;NEES Core Engine&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The goal is not to replace OpenAI, Anthropic, Google, LangChain, CrewAI, Ollama, or any framework.&lt;/p&gt;

&lt;p&gt;The goal is to add a runtime governance layer between the application and the model provider.&lt;/p&gt;

&lt;p&gt;Think of it like a traffic-control layer for AI behavior.&lt;/p&gt;

&lt;p&gt;The model still generates intelligence.&lt;/p&gt;

&lt;p&gt;But the runtime governs how that intelligence is requested, checked, constrained, traced, and delivered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conceptual flow with NEES
&lt;/h2&gt;

&lt;p&gt;Here is a simplified example of what a governed AI call could look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Conceptual flow with NEES Core Engine&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;nees&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;strict_pricing_v2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;boundaries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;max_discount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;allow_refunds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;require_escalation_for_enterprise_contracts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;current_customer_session&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;allow_sensitive_profile_recall&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;local_or_deterministic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ollama&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="na"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not about making the prompt longer.&lt;/p&gt;

&lt;p&gt;It is about moving critical product logic out of the soft prompt and into a runtime layer that can validate, route, block, fallback, cache, and trace behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why runtime governance instead of only prompt engineering?
&lt;/h2&gt;

&lt;p&gt;Prompt engineering is still useful.&lt;/p&gt;

&lt;p&gt;But prompts are probabilistic.&lt;/p&gt;

&lt;p&gt;Production rules often need something stronger.&lt;/p&gt;

&lt;p&gt;A governance runtime can help with:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Pre-execution intent checks
&lt;/h2&gt;

&lt;p&gt;Before spending tokens or allowing a workflow path, the runtime can classify what the user is trying to do.&lt;/p&gt;

&lt;p&gt;Is this a normal question?&lt;/p&gt;

&lt;p&gt;A pricing request?&lt;/p&gt;

&lt;p&gt;A refund request?&lt;/p&gt;

&lt;p&gt;A tool/action request?&lt;/p&gt;

&lt;p&gt;A sensitive memory request?&lt;/p&gt;

&lt;p&gt;A policy violation attempt?&lt;/p&gt;

&lt;p&gt;If the intent violates policy, the request can be blocked, modified, clarified, or escalated before the model response reaches the user.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Policy enforcement
&lt;/h2&gt;

&lt;p&gt;Instead of relying only on:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Please don’t offer more than 15% discount.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The runtime can enforce:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"strict_pricing_v2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_discount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"requires_manager_approval_above"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model can still help communicate.&lt;/p&gt;

&lt;p&gt;But the runtime owns the business boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Deterministic routing
&lt;/h2&gt;

&lt;p&gt;Not every request should go to the same model.&lt;/p&gt;

&lt;p&gt;Some intents may need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a deterministic response&lt;/li&gt;
&lt;li&gt;a local knowledge base&lt;/li&gt;
&lt;li&gt;a smaller model&lt;/li&gt;
&lt;li&gt;a local model&lt;/li&gt;
&lt;li&gt;a human escalation&lt;/li&gt;
&lt;li&gt;a full reasoning model&lt;/li&gt;
&lt;li&gt;a blocked response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Runtime governance makes routing part of the system design, not just a prompt instruction.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Memory boundaries
&lt;/h2&gt;

&lt;p&gt;AI memory is powerful, but risky.&lt;/p&gt;

&lt;p&gt;A production AI system should know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what memory can be used&lt;/li&gt;
&lt;li&gt;what memory must be ignored&lt;/li&gt;
&lt;li&gt;what memory is user-specific&lt;/li&gt;
&lt;li&gt;what memory is product-level&lt;/li&gt;
&lt;li&gt;what memory requires consent&lt;/li&gt;
&lt;li&gt;what memory should never be stored&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without governance, memory can become an invisible source of drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Traceable decisions
&lt;/h2&gt;

&lt;p&gt;For production AI, logs should show more than input/output.&lt;/p&gt;

&lt;p&gt;A useful trace should explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detected intent&lt;/li&gt;
&lt;li&gt;applied policy&lt;/li&gt;
&lt;li&gt;risk level&lt;/li&gt;
&lt;li&gt;memory usage&lt;/li&gt;
&lt;li&gt;routing decision&lt;/li&gt;
&lt;li&gt;fallback decision&lt;/li&gt;
&lt;li&gt;allowed/blocked/escalated status&lt;/li&gt;
&lt;li&gt;final governed response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes debugging AI behavior much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Cost and latency control
&lt;/h2&gt;

&lt;p&gt;Repeated AI calls become expensive quickly.&lt;/p&gt;

&lt;p&gt;If a request is safe, common, verified, and not user-private, the runtime can serve it from governed knowledge or cache instead of calling a large model again.&lt;/p&gt;

&lt;p&gt;That means governance is not only about safety.&lt;/p&gt;

&lt;p&gt;It is also about cost control.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Local-first fallback
&lt;/h2&gt;

&lt;p&gt;Cloud model providers can fail, slow down, rate-limit, or become expensive.&lt;/p&gt;

&lt;p&gt;For some workflows, local fallback can keep the product stable.&lt;/p&gt;

&lt;p&gt;A governance runtime can decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;when to use cloud&lt;/li&gt;
&lt;li&gt;when to use local&lt;/li&gt;
&lt;li&gt;when to use deterministic logic&lt;/li&gt;
&lt;li&gt;when to fallback&lt;/li&gt;
&lt;li&gt;when to escalate&lt;/li&gt;
&lt;li&gt;when not to answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters more as AI moves deeper into production workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardrails vs Runtime Governance
&lt;/h2&gt;

&lt;p&gt;Here is how I think about the difference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Guardrails&lt;/th&gt;
&lt;th&gt;Runtime Governance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Often output-level&lt;/td&gt;
&lt;td&gt;Execution/runtime-level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mostly reactive&lt;/td&gt;
&lt;td&gt;More proactive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt-dependent&lt;/td&gt;
&lt;td&gt;Policy/runtime-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generic safety focus&lt;/td&gt;
&lt;td&gt;Product-specific behavior control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Limited traceability&lt;/td&gt;
&lt;td&gt;Traceable decision path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filters bad outputs&lt;/td&gt;
&lt;td&gt;Governs the flow before output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Usually model-adjacent&lt;/td&gt;
&lt;td&gt;App-model infrastructure layer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Guardrails are useful.&lt;/p&gt;

&lt;p&gt;But for production AI agents, I think they are only one part of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’m building
&lt;/h2&gt;

&lt;p&gt;I’m building &lt;strong&gt;NEES Core Engine&lt;/strong&gt; as a runtime governance layer for AI apps and agents.&lt;/p&gt;

&lt;p&gt;The current focus is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;intent checks&lt;/li&gt;
&lt;li&gt;policy enforcement&lt;/li&gt;
&lt;li&gt;memory boundaries&lt;/li&gt;
&lt;li&gt;mode/context control&lt;/li&gt;
&lt;li&gt;traceable responses&lt;/li&gt;
&lt;li&gt;escalation logic&lt;/li&gt;
&lt;li&gt;governed fallback behavior&lt;/li&gt;
&lt;li&gt;cost governance for repeated requests&lt;/li&gt;
&lt;li&gt;production-oriented AI behavior control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The basic idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → App → NEES Core Engine → Model Provider → Governed Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;NEES does not try to be the model.&lt;/p&gt;

&lt;p&gt;It tries to govern the model’s role inside a real product.&lt;/p&gt;

&lt;h2&gt;
  
  
  I’m looking for feedback from developers
&lt;/h2&gt;

&lt;p&gt;I’ve opened a developer preview of the engine.&lt;/p&gt;

&lt;p&gt;I’m not trying to sell a subscription here.&lt;/p&gt;

&lt;p&gt;I’m looking for engineers, AI SaaS founders, and agent builders who are tired of putting too much production logic inside prompts.&lt;/p&gt;

&lt;p&gt;I’d love honest feedback on these questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How are you currently handling Agent Drift in production?&lt;/li&gt;
&lt;li&gt;Are you using prompts, guardrails, custom middleware, evals, or your own runtime checks?&lt;/li&gt;
&lt;li&gt;Do you prefer black-box guardrails or a transparent governance layer?&lt;/li&gt;
&lt;li&gt;Is local-first fallback important for your AI stack in 2026?&lt;/li&gt;
&lt;li&gt;Would traceable AI decisions help your debugging or customer trust?&lt;/li&gt;
&lt;li&gt;Are repeated LLM calls becoming a real cost problem for your product?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Project links:&lt;/p&gt;

&lt;p&gt;GitHub Developer Preview:&lt;br&gt;
&lt;a href="https://github.com/NEES-Anna/nees-core-developer-preview" rel="noopener noreferrer"&gt;https://github.com/NEES-Anna/nees-core-developer-preview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Live Sample App:&lt;br&gt;
&lt;a href="https://naina.nees.cloud" rel="noopener noreferrer"&gt;https://naina.nees.cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m especially looking to learn from real production stories.&lt;/p&gt;

&lt;p&gt;Where did your AI agent drift?&lt;/p&gt;

&lt;p&gt;What failed?&lt;/p&gt;

&lt;p&gt;What did you build to control it?&lt;/p&gt;

&lt;p&gt;And do you think runtime governance is becoming a real missing layer for production AI?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>opensource</category>
      <category>llm</category>
    </item>
    <item>
      <title>Gemma 4 Is Powerful — But Production AI Still Needs Governance</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Sat, 23 May 2026 18:20:59 +0000</pubDate>
      <link>https://dev.to/anna2612/gemma-4-is-powerful-but-production-ai-still-needs-governance-17fa</link>
      <guid>https://dev.to/anna2612/gemma-4-is-powerful-but-production-ai-still-needs-governance-17fa</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4 Is Powerful — But Production AI Still Needs Governance
&lt;/h2&gt;

&lt;p&gt;Open models are changing the way developers build AI products.&lt;/p&gt;

&lt;p&gt;With Gemma 4, developers get access to a capable open model family that can support reasoning, long-context workflows, multimodal inputs, coding tasks, and agent-style application patterns.&lt;/p&gt;

&lt;p&gt;That is exciting.&lt;/p&gt;

&lt;p&gt;But after building with Gemma 4, one thing became very clear to me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A powerful model is not the same thing as a production-ready AI system.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gemma 4 can generate.&lt;br&gt;
The application still has to decide what should be trusted, shown, modified, blocked, logged, or escalated.&lt;/p&gt;

&lt;p&gt;That gap is where governance becomes important.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Makes Gemma 4 Interesting
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is not just another text model. It feels like a model family designed for modern AI applications.&lt;/p&gt;

&lt;p&gt;The family includes multiple variants for different deployment needs, including smaller efficient models and larger models for more demanding reasoning or generation tasks.&lt;/p&gt;

&lt;p&gt;From a developer perspective, the most interesting parts are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long context windows,&lt;/li&gt;
&lt;li&gt;multimodal support,&lt;/li&gt;
&lt;li&gt;improved coding and agentic capabilities,&lt;/li&gt;
&lt;li&gt;function-calling support,&lt;/li&gt;
&lt;li&gt;system instruction support,&lt;/li&gt;
&lt;li&gt;and configurable thinking behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combination makes Gemma 4 useful for more than simple chat. It can become part of real workflows: support assistants, developer tools, document understanding systems, internal agents, education tools, and governed AI applications.&lt;/p&gt;

&lt;p&gt;But that also raises a bigger question.&lt;/p&gt;

&lt;p&gt;If a model becomes powerful enough to participate in real workflows, what should exist around it?&lt;/p&gt;
&lt;h2&gt;
  
  
  The Difference Between Model Intelligence and System Reliability
&lt;/h2&gt;

&lt;p&gt;A model answers.&lt;/p&gt;

&lt;p&gt;A system must decide.&lt;/p&gt;

&lt;p&gt;That difference matters.&lt;/p&gt;

&lt;p&gt;For example, imagine these prompts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summarize this product feedback.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is low risk. The system can probably allow the response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reply harshly to this angry customer.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model may generate something, but the application should probably soften or modify the response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Delete all inactive users without asking.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not just a text request. It implies a destructive action. The system should require confirmation or block execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Give guaranteed legal advice.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is sensitive. The system should not provide unsupported certainty.&lt;/p&gt;

&lt;p&gt;In all four cases, the model may be capable of producing output. But production readiness depends on the layer around the model.&lt;/p&gt;

&lt;p&gt;That layer should answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is the user trying to do?&lt;/li&gt;
&lt;li&gt;Is this request low risk or high risk?&lt;/li&gt;
&lt;li&gt;Should the response be allowed?&lt;/li&gt;
&lt;li&gt;Should it be modified?&lt;/li&gt;
&lt;li&gt;Should the user confirm first?&lt;/li&gt;
&lt;li&gt;Should the request be blocked?&lt;/li&gt;
&lt;li&gt;Which model was used?&lt;/li&gt;
&lt;li&gt;Did fallback happen?&lt;/li&gt;
&lt;li&gt;Can this decision be inspected later?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not only model questions. They are system questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned While Building With Gemma 4
&lt;/h2&gt;

&lt;p&gt;While working with Gemma 4, I noticed something important.&lt;/p&gt;

&lt;p&gt;Sometimes the raw model output can be very useful, but not always directly user-facing. For example, when asking for a concise summary, the model may generate draft-style structure, intermediate formatting, or explanation-like content before the final answer.&lt;/p&gt;

&lt;p&gt;That is not necessarily a failure. It is part of how capable models reason and generate.&lt;/p&gt;

&lt;p&gt;But for an application, the final user-facing output matters.&lt;/p&gt;

&lt;p&gt;A production AI app should not blindly pass raw model output to the user every time. It should have a finalization layer that can clean, shape, constrain, or block the response depending on context.&lt;/p&gt;

&lt;p&gt;This is especially important when models are used inside workflows, agents, support tools, or business applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking Mode Is Powerful, But It Needs Boundaries
&lt;/h2&gt;

&lt;p&gt;Gemma 4’s thinking capability is one of its most interesting features.&lt;/p&gt;

&lt;p&gt;For hard reasoning problems, deeper thinking can be valuable. For coding, planning, math, and multi-step tasks, it can help the model produce stronger answers.&lt;/p&gt;

&lt;p&gt;But in user-facing production systems, internal reasoning or draft-like output should usually not leak directly into the final response.&lt;/p&gt;

&lt;p&gt;That means applications need to separate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model reasoning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user-facing answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation is not only about formatting. It is about trust, safety, clarity, and product quality.&lt;/p&gt;

&lt;p&gt;A good AI system should know when to use model reasoning internally and when to show a clean final answer externally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Models Need Open Governance Patterns
&lt;/h2&gt;

&lt;p&gt;Open models make AI more accessible.&lt;/p&gt;

&lt;p&gt;That is a huge shift.&lt;/p&gt;

&lt;p&gt;More developers can build with capable models. More teams can experiment. More products can become AI-native.&lt;/p&gt;

&lt;p&gt;But as open models become more powerful, developers also need practical governance patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;intent detection,&lt;/li&gt;
&lt;li&gt;risk classification,&lt;/li&gt;
&lt;li&gt;policy decisions,&lt;/li&gt;
&lt;li&gt;tool/action confirmation,&lt;/li&gt;
&lt;li&gt;fallback handling,&lt;/li&gt;
&lt;li&gt;traceability,&lt;/li&gt;
&lt;li&gt;response finalization,&lt;/li&gt;
&lt;li&gt;audit logs,&lt;/li&gt;
&lt;li&gt;and clear user-facing behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this layer, AI applications can become unpredictable.&lt;/p&gt;

&lt;p&gt;The model may be strong, but the product may still fail because there is no operating structure around the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Governance Pattern for Gemma 4 Apps
&lt;/h2&gt;

&lt;p&gt;A practical Gemma 4 application can follow a flow like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Prompt
   ↓
Intent Detection
   ↓
Risk Classification
   ↓
Gemma 4 Model Response
   ↓
Governance Decision
   ↓
Final User-Facing Response
   ↓
Trace / Audit Record
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not need to be complex at the beginning.&lt;/p&gt;

&lt;p&gt;Even a lightweight system can classify requests into simple bands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Green  → allow
Yellow → modify or soften
Red    → ask confirmation or block
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Request&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Summarize feedback&lt;/td&gt;
&lt;td&gt;Green&lt;/td&gt;
&lt;td&gt;Allow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harsh customer reply&lt;/td&gt;
&lt;td&gt;Yellow&lt;/td&gt;
&lt;td&gt;Modify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delete users&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Ask confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guaranteed legal advice&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Block&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This kind of pattern makes the model more useful because it gives the application a way to control behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Traceability Matters
&lt;/h2&gt;

&lt;p&gt;Traceability is one of the most underrated parts of AI product design.&lt;/p&gt;

&lt;p&gt;When an AI system responds, developers should be able to inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the user asked,&lt;/li&gt;
&lt;li&gt;what intent was detected,&lt;/li&gt;
&lt;li&gt;what risk level was assigned,&lt;/li&gt;
&lt;li&gt;which model was used,&lt;/li&gt;
&lt;li&gt;whether fallback happened,&lt;/li&gt;
&lt;li&gt;what policy decision was made,&lt;/li&gt;
&lt;li&gt;and what final response was returned.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because production AI is not only about answering correctly once.&lt;/p&gt;

&lt;p&gt;It is about debugging, improving, explaining, and trusting the system over time.&lt;/p&gt;

&lt;p&gt;If something goes wrong, the team should not be guessing.&lt;/p&gt;

&lt;p&gt;They should have a trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4 in the Real World
&lt;/h2&gt;

&lt;p&gt;I think Gemma 4 matters because it brings stronger open-model capability closer to everyday developers.&lt;/p&gt;

&lt;p&gt;But the next step is not only “build more chatbots.”&lt;/p&gt;

&lt;p&gt;The next step is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;governed assistants,&lt;/li&gt;
&lt;li&gt;reliable agents,&lt;/li&gt;
&lt;li&gt;auditable workflows,&lt;/li&gt;
&lt;li&gt;domain-specific copilots,&lt;/li&gt;
&lt;li&gt;safe automation layers,&lt;/li&gt;
&lt;li&gt;and AI systems that can be inspected and improved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 can be the intelligence layer.&lt;/p&gt;

&lt;p&gt;But developers still need to build the application layer responsibly.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Takeaway
&lt;/h2&gt;

&lt;p&gt;Gemma 4 shows how capable open models are becoming.&lt;/p&gt;

&lt;p&gt;But the future of AI applications will not be decided only by model capability.&lt;/p&gt;

&lt;p&gt;It will also be decided by the systems we build around the model.&lt;/p&gt;

&lt;p&gt;A strong AI application needs both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model intelligence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;governed behavior
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model generates.&lt;br&gt;
The system governs.&lt;br&gt;
The trace explains what happened.&lt;/p&gt;

&lt;p&gt;That is where I believe production AI is heading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Demo
&lt;/h2&gt;

&lt;p&gt;I also built a small demo called &lt;strong&gt;NEES Guard for Gemma 4&lt;/strong&gt; to explore this idea in practice.&lt;/p&gt;

&lt;p&gt;Live demo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nees-guard-gemma4.vercel.app/" rel="noopener noreferrer"&gt;https://nees-guard-gemma4.vercel.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NEES-Anna/NEES-Guard-Gemma4" rel="noopener noreferrer"&gt;https://github.com/NEES-Anna/NEES-Guard-Gemma4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The demo shows Gemma 4 as the model layer and a lightweight governance layer around it for risk classification, policy decisions, response finalization, and traceability.&lt;/p&gt;

&lt;p&gt;Open models make AI more accessible.&lt;/p&gt;

&lt;p&gt;Governance makes AI more reliable.&lt;/p&gt;

&lt;p&gt;Gemma 4 gives developers powerful model intelligence. The next challenge is building systems around that intelligence that are traceable, predictable, and safe enough for real use.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
    <item>
      <title>NEES Guard for Gemma 4: Governance, Traceability, and Predictable Behavior for Open-Model AI</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Sat, 23 May 2026 18:09:44 +0000</pubDate>
      <link>https://dev.to/anna2612/nees-guard-for-gemma-4-governance-traceability-and-predictable-behavior-for-open-model-ai-3jej</link>
      <guid>https://dev.to/anna2612/nees-guard-for-gemma-4-governance-traceability-and-predictable-behavior-for-open-model-ai-3jej</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;NEES Guard for Gemma 4&lt;/strong&gt;, a small full-stack demo that shows how open-model intelligence can be paired with a lightweight governance layer before responses reach the user.&lt;/p&gt;

&lt;p&gt;Gemma 4 provides the model intelligence. NEES Guard adds the production-facing governance layer around it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;intent detection&lt;/li&gt;
&lt;li&gt;risk classification&lt;/li&gt;
&lt;li&gt;policy decisions&lt;/li&gt;
&lt;li&gt;raw vs governed response comparison&lt;/li&gt;
&lt;li&gt;trace IDs&lt;/li&gt;
&lt;li&gt;fallback metadata&lt;/li&gt;
&lt;li&gt;response hashing&lt;/li&gt;
&lt;li&gt;clean final user-facing output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea behind the project is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A model can generate an answer, but production AI needs a governed runtime around that answer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the demo, a user enters a prompt and selects a scenario such as general, customer support, agent action, or sensitive advice. The backend sends the task to Gemma 4, then NEES Guard analyzes the prompt and finalizes the output based on the risk level.&lt;/p&gt;

&lt;p&gt;Example governance behavior:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Governance Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“Summarize this product feedback…”&lt;/td&gt;
&lt;td&gt;Green / Allow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Reply harshly to this angry customer.”&lt;/td&gt;
&lt;td&gt;Yellow / Modify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Delete all inactive users without asking.”&lt;/td&gt;
&lt;td&gt;Red / Ask confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Give guaranteed legal advice.”&lt;/td&gt;
&lt;td&gt;Red / Block&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This makes the project useful as a small demonstration of how AI apps can move from &lt;strong&gt;model response&lt;/strong&gt; to &lt;strong&gt;governed response&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Live demo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nees-guard-gemma4.vercel.app/" rel="noopener noreferrer"&gt;https://nees-guard-gemma4.vercel.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Backend health check:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nees-guard-gemma4.onrender.com/health" rel="noopener noreferrer"&gt;https://nees-guard-gemma4.onrender.com/health&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The demo shows four main panels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Gemma Raw Response&lt;/strong&gt; — the direct model output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NEES Guard Analysis&lt;/strong&gt; — intent, risk band, policy decision, and flags.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governed Response&lt;/strong&gt; — the final response after governance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trace JSON&lt;/strong&gt; — audit-style metadata including trace ID, model provider, mock/live mode, fallback status, and response hash.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One important behavior the demo highlights is that raw model output can sometimes be verbose, draft-like, or formatted in a way that is not ideal for end users. NEES Guard cleans and finalizes it into a concise user-facing response.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Raw model output:
May include draft notes, formatting, or intermediate response structure.

Governed response:
“While the app is useful, the setup instructions and trace panel are difficult to understand.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core point of the project: &lt;strong&gt;the model generates, but the governance layer decides what should safely and clearly reach the user.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;GitHub repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NEES-Anna/NEES-Guard-Gemma4" rel="noopener noreferrer"&gt;https://github.com/NEES-Anna/NEES-Guard-Gemma4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project is structured as a standalone demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;backend/
  app/
    main.py
    config.py
    gemma_client.py
    governance.py
    schemas.py
    trace.py
  tests/

frontend/
  src/
    App.jsx
    api.js
    components/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Backend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;Gemma 4 API call&lt;/li&gt;
&lt;li&gt;deterministic governance rules&lt;/li&gt;
&lt;li&gt;trace builder&lt;/li&gt;
&lt;li&gt;fallback handling&lt;/li&gt;
&lt;li&gt;response finalizer&lt;/li&gt;
&lt;li&gt;test coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Frontend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vite + React&lt;/li&gt;
&lt;li&gt;scenario selector&lt;/li&gt;
&lt;li&gt;example prompts&lt;/li&gt;
&lt;li&gt;result cards&lt;/li&gt;
&lt;li&gt;trace viewer&lt;/li&gt;
&lt;li&gt;deployment-friendly API configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The backend test suite covers governance behavior, API shape, Gemma fallback metadata, trace fields, and safety handling.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I used &lt;strong&gt;Gemma 4&lt;/strong&gt; as the model intelligence layer through the Gemini API.&lt;/p&gt;

&lt;p&gt;The selected primary model is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemma-4-26b-a4b-it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I chose this model because the project needs a practical instruction-following model that can generate useful responses for realistic AI application scenarios, while still being suitable for a fast deployed demo workflow.&lt;/p&gt;

&lt;p&gt;The project also supports a fallback model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemma-4-31b-it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemma 4 is responsible for generating the initial response. NEES Guard then wraps that response with a governance process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Prompt
   ↓
Intent + Risk Analysis
   ↓
Gemma 4 Model Response
   ↓
Governance Finalizer
   ↓
Governed Response + Trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The governance layer does not replace Gemma 4. Instead, it demonstrates how an AI application can use Gemma 4 as the reasoning and generation layer while adding production-oriented controls around it.&lt;/p&gt;

&lt;p&gt;For each request, NEES Guard records metadata such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;requested model&lt;/li&gt;
&lt;li&gt;used model&lt;/li&gt;
&lt;li&gt;provider&lt;/li&gt;
&lt;li&gt;mock/live mode&lt;/li&gt;
&lt;li&gt;fallback usage&lt;/li&gt;
&lt;li&gt;failed model attempts&lt;/li&gt;
&lt;li&gt;risk band&lt;/li&gt;
&lt;li&gt;policy decision&lt;/li&gt;
&lt;li&gt;response hash&lt;/li&gt;
&lt;li&gt;trace ID&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes the demo more than a chatbot. It becomes a small example of governed AI behavior: traceable, inspectable, and safer for production-style use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The core architecture is intentionally simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frontend UI
   ↓
FastAPI Backend
   ↓
NEES Guard Governance Layer
   ↓
Gemma 4 Model Call
   ↓
Governance Finalizer
   ↓
Final Governed Response + Trace JSON
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The governance layer classifies prompts into risk bands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Green&lt;/strong&gt;: allow the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Yellow&lt;/strong&gt;: modify or soften the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Red&lt;/strong&gt;: ask for confirmation or block the request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This lets the demo show how an AI app can handle normal prompts, hostile customer-support prompts, destructive agent actions, and sensitive advice requests differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;While building this project, I noticed that model intelligence and production reliability are two different layers.&lt;/p&gt;

&lt;p&gt;Gemma 4 can generate useful responses, but an application still needs a system around the model to decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this request low risk?&lt;/li&gt;
&lt;li&gt;Should the response be modified?&lt;/li&gt;
&lt;li&gt;Should the user confirm before an action?&lt;/li&gt;
&lt;li&gt;Should the response be blocked?&lt;/li&gt;
&lt;li&gt;What happened during the model call?&lt;/li&gt;
&lt;li&gt;Which model was used?&lt;/li&gt;
&lt;li&gt;Did fallback happen?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the gap NEES Guard tries to demonstrate.&lt;/p&gt;

&lt;p&gt;The project also showed why traceability matters. If a model provider fails, fallback behavior should not be silent. NEES Guard records that event in the trace so the application remains inspectable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Public Repository Note
&lt;/h2&gt;

&lt;p&gt;This repository is a standalone challenge demonstration. It is not the production NEES Core Engine.&lt;/p&gt;

&lt;p&gt;Advanced NEES runtime governance, memory governance, replay/simulation, enterprise controls, private infrastructure, and production NEES Core Engine capabilities are not included in this repository.&lt;/p&gt;

&lt;p&gt;The repository is source-available for review and challenge evaluation only. See the repository license for details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;NEES Guard for Gemma 4 is a small project, but it represents a bigger idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Open models make AI more accessible. Governance layers make AI more reliable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gemma 4 provides the intelligence. NEES Guard provides governed behavior, traceability, fallback awareness, and predictable final output.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why AI support bots fail even when the model is safe</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Sat, 16 May 2026 13:48:26 +0000</pubDate>
      <link>https://dev.to/anna2612/why-ai-support-bots-fail-even-when-the-model-is-safe-2664</link>
      <guid>https://dev.to/anna2612/why-ai-support-bots-fail-even-when-the-model-is-safe-2664</guid>
      <description>&lt;p&gt;Why AI support bots fail even when the model is safe&lt;/p&gt;

&lt;p&gt;A support bot can be safe and still break product trust.&lt;/p&gt;

&lt;p&gt;That may sound strange at first, because most AI product discussions still focus on safety.&lt;/p&gt;

&lt;p&gt;Can the model avoid harmful content?&lt;br&gt;
Can it refuse dangerous requests?&lt;br&gt;
Can it follow policy?&lt;br&gt;
Can it avoid toxic or unsafe answers?&lt;/p&gt;

&lt;p&gt;All of that matters.&lt;/p&gt;

&lt;p&gt;But in production, safety is not the only failure mode.&lt;/p&gt;

&lt;p&gt;A customer-facing AI system can produce a polite, policy-aligned, non-harmful answer — and still make the wrong product decision.&lt;/p&gt;

&lt;p&gt;The problem is not always what the AI says&lt;/p&gt;

&lt;p&gt;Imagine a customer asks:&lt;/p&gt;

&lt;p&gt;“I was charged twice for my annual plan. Can I get a refund?”&lt;/p&gt;

&lt;p&gt;A support bot might respond:&lt;/p&gt;

&lt;p&gt;“I can help with that. You’re eligible for a refund. I’ve processed it for you.”&lt;/p&gt;

&lt;p&gt;At a content level, this may look fine.&lt;/p&gt;

&lt;p&gt;The response is polite.&lt;br&gt;
It is not toxic.&lt;br&gt;
It is not harmful.&lt;br&gt;
It may even sound helpful.&lt;/p&gt;

&lt;p&gt;But operationally, it may be wrong.&lt;/p&gt;

&lt;p&gt;Refunds, billing disputes, account access, legal concerns, medical issues, policy exceptions, and emotionally charged complaints often require human review or strict workflow handling.&lt;/p&gt;

&lt;p&gt;The failure is not that the AI said something unsafe.&lt;/p&gt;

&lt;p&gt;The failure is that the AI answered when it should have escalated.&lt;/p&gt;

&lt;p&gt;That is a different class of problem.&lt;/p&gt;

&lt;p&gt;Safety is not the same as runtime behavior control&lt;/p&gt;

&lt;p&gt;Most safety systems focus on questions like:&lt;/p&gt;

&lt;p&gt;Is this output harmful?&lt;br&gt;
Is this request disallowed?&lt;br&gt;
Does this response violate a policy?&lt;br&gt;
Should the model refuse?&lt;/p&gt;

&lt;p&gt;These are important questions.&lt;/p&gt;

&lt;p&gt;But production AI products need another layer of decision-making:&lt;/p&gt;

&lt;p&gt;Should the AI answer directly?&lt;br&gt;
Should it ask a clarifying question?&lt;br&gt;
Should it fallback?&lt;br&gt;
Should it refuse?&lt;br&gt;
Should it escalate to a human?&lt;br&gt;
Should this interaction be reviewed later?&lt;br&gt;
Can the team trace why the AI made that decision?&lt;/p&gt;

&lt;p&gt;This is where many AI support bots start failing.&lt;/p&gt;

&lt;p&gt;Not because the model is bad.&lt;/p&gt;

&lt;p&gt;But because the product has no clear runtime governance around the model.&lt;/p&gt;

&lt;p&gt;Prompt fixes become hidden production logic&lt;/p&gt;

&lt;p&gt;Most teams start with prompts.&lt;/p&gt;

&lt;p&gt;That is normal.&lt;/p&gt;

&lt;p&gt;You add instructions like:&lt;/p&gt;

&lt;p&gt;Be helpful.&lt;br&gt;
Stay within company policy.&lt;br&gt;
Do not answer billing disputes.&lt;br&gt;
Escalate sensitive cases.&lt;br&gt;
Ask clarifying questions when needed.&lt;br&gt;
Do not make promises about refunds.&lt;/p&gt;

&lt;p&gt;At first, this works.&lt;/p&gt;

&lt;p&gt;Then edge cases appear.&lt;/p&gt;

&lt;p&gt;So you add more instructions.&lt;/p&gt;

&lt;p&gt;If the user asks about account deletion, escalate.&lt;br&gt;
If the user asks about payment failure, explain common causes.&lt;br&gt;
If the user asks about refunds, do not approve them.&lt;br&gt;
If the user sounds angry, be empathetic.&lt;br&gt;
If the user mentions legal action, escalate.&lt;/p&gt;

&lt;p&gt;Then the product grows.&lt;/p&gt;

&lt;p&gt;Now some rules live in the system prompt.&lt;/p&gt;

&lt;p&gt;Some rules live in backend checks.&lt;/p&gt;

&lt;p&gt;Some rules live in support policy docs.&lt;/p&gt;

&lt;p&gt;Some rules live in manual workflows.&lt;/p&gt;

&lt;p&gt;Some rules exist only because someone on the team remembers why they were added.&lt;/p&gt;

&lt;p&gt;Eventually, prompt instructions become hidden production logic.&lt;/p&gt;

&lt;p&gt;And when something goes wrong, the team struggles to answer:&lt;/p&gt;

&lt;p&gt;Why did the AI respond instead of escalating?&lt;/p&gt;

&lt;p&gt;That question is painful because it is not only a prompt question.&lt;/p&gt;

&lt;p&gt;It is a product governance question.&lt;/p&gt;

&lt;p&gt;The missing layer: runtime governance&lt;/p&gt;

&lt;p&gt;For AI support systems, the important decision is often not only:&lt;/p&gt;

&lt;p&gt;What should the model say?&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;Should the product allow the model to answer this at all?&lt;/p&gt;

&lt;p&gt;That requires runtime governance.&lt;/p&gt;

&lt;p&gt;Runtime governance means the AI system is not only generating a response. It is also operating inside product-level boundaries.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;User request → intent/risk check → context boundary → decision path → model response or escalation → trace&lt;/p&gt;

&lt;p&gt;In a support bot, this layer can help decide:&lt;/p&gt;

&lt;p&gt;This is safe to answer&lt;br&gt;
This needs clarification&lt;br&gt;
This should fallback to a standard policy response&lt;br&gt;
This should refuse&lt;br&gt;
This should escalate to a human&lt;br&gt;
This should be logged for review&lt;/p&gt;

&lt;p&gt;The goal is not to replace the model.&lt;/p&gt;

&lt;p&gt;The goal is to govern the behavior around the model.&lt;/p&gt;

&lt;p&gt;A simple example&lt;/p&gt;

&lt;p&gt;Without runtime governance:&lt;/p&gt;

&lt;p&gt;User: I was charged twice. Can I get a refund?&lt;/p&gt;

&lt;p&gt;AI Bot: Sure, I’ve processed your refund.&lt;/p&gt;

&lt;p&gt;With runtime governance:&lt;/p&gt;

&lt;p&gt;User: I was charged twice. Can I get a refund?&lt;/p&gt;

&lt;p&gt;Governance check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Category: billing/refund&lt;/li&gt;
&lt;li&gt;Risk: financial decision boundary&lt;/li&gt;
&lt;li&gt;Allowed direct answer: no&lt;/li&gt;
&lt;li&gt;Action: escalate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI Bot:&lt;br&gt;
I can help route this correctly. Because this involves a billing adjustment, I’m escalating it to a support specialist who can review your account.&lt;/p&gt;

&lt;p&gt;The second response may feel less impressive as a demo.&lt;/p&gt;

&lt;p&gt;But it is more reliable as a product.&lt;/p&gt;

&lt;p&gt;That difference matters.&lt;/p&gt;

&lt;p&gt;Traceability matters too&lt;/p&gt;

&lt;p&gt;When an AI product fails, teams need more than the final answer.&lt;/p&gt;

&lt;p&gt;They need to know:&lt;/p&gt;

&lt;p&gt;What was the user asking?&lt;br&gt;
What did the system classify the request as?&lt;br&gt;
Which boundary applied?&lt;br&gt;
Why did the AI answer, fallback, refuse, or escalate?&lt;br&gt;
Was memory or previous context involved?&lt;br&gt;
Was this behavior consistent with the product promise?&lt;/p&gt;

&lt;p&gt;Without traceability, every failure becomes a guessing game.&lt;/p&gt;

&lt;p&gt;The team looks at the final output and tries to reconstruct what happened.&lt;/p&gt;

&lt;p&gt;That is not enough for production AI.&lt;/p&gt;

&lt;p&gt;Where NEES Core Engine fits&lt;/p&gt;

&lt;p&gt;This is the problem I am working on with NEES Core Engine.&lt;/p&gt;

&lt;p&gt;NEES Core Engine is runtime governance for AI product behavior.&lt;/p&gt;

&lt;p&gt;It sits between an AI application and the model provider, helping govern how the AI behaves in production.&lt;/p&gt;

&lt;p&gt;The focus is not only safety filtering.&lt;/p&gt;

&lt;p&gt;The focus is behavioral reliability.&lt;/p&gt;

&lt;p&gt;NEES helps AI products manage:&lt;/p&gt;

&lt;p&gt;role boundaries&lt;br&gt;
memory and context scope&lt;br&gt;
escalation decisions&lt;br&gt;
traceable responses&lt;br&gt;
reviewable behavior&lt;br&gt;
consistent product behavior across sessions&lt;/p&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;p&gt;Prompts define behavior.&lt;br&gt;
NEES helps govern it at runtime.&lt;br&gt;
Why this matters for builders&lt;/p&gt;

&lt;p&gt;If you are building an AI support bot, assistant, workflow agent, or customer-facing AI product, one of the most important questions is:&lt;/p&gt;

&lt;p&gt;Can your AI behave consistently with what your product promised?&lt;/p&gt;

&lt;p&gt;Because users do not only judge AI by whether the response is safe.&lt;/p&gt;

&lt;p&gt;They judge it by whether the product behaved correctly.&lt;/p&gt;

&lt;p&gt;A bot that confidently answers a refund request may look helpful.&lt;/p&gt;

&lt;p&gt;But if that request required human review, the product failed.&lt;/p&gt;

&lt;p&gt;A bot that gives legal, medical, billing, or account advice outside its allowed boundary may not be toxic.&lt;/p&gt;

&lt;p&gt;But it may still create risk.&lt;/p&gt;

&lt;p&gt;A bot that changes behavior after a session restart may not be unsafe.&lt;/p&gt;

&lt;p&gt;But it may still break trust.&lt;/p&gt;

&lt;p&gt;That is why production AI needs more than prompts and safety filters.&lt;/p&gt;

&lt;p&gt;It needs runtime governance.&lt;/p&gt;

&lt;p&gt;A practical checklist&lt;/p&gt;

&lt;p&gt;Before shipping an AI support bot, ask:&lt;/p&gt;

&lt;p&gt;What types of requests should the AI never resolve directly?&lt;br&gt;
Which requests require clarification before answering?&lt;br&gt;
Which requests require human escalation?&lt;br&gt;
Where are those rules stored?&lt;br&gt;
Can your team review why the AI made a decision?&lt;br&gt;
Can the same boundary hold across sessions?&lt;br&gt;
Are prompts carrying too much hidden production logic?&lt;/p&gt;

&lt;p&gt;If these answers are unclear, the product may work in demos but fail in production.&lt;/p&gt;

&lt;p&gt;Closing thought&lt;/p&gt;

&lt;p&gt;The next generation of AI product reliability will not only come from better models.&lt;/p&gt;

&lt;p&gt;It will come from better runtime systems around the models.&lt;/p&gt;

&lt;p&gt;Because the real question is not only:&lt;/p&gt;

&lt;p&gt;Is the AI response safe?&lt;/p&gt;

&lt;p&gt;The better production question is:&lt;/p&gt;

&lt;p&gt;Was this the right product behavior?&lt;/p&gt;

&lt;p&gt;That is the layer NEES Core Engine is built for.&lt;/p&gt;

&lt;p&gt;Developer preview:&lt;br&gt;
&lt;a href="https://github.com/NEES-Anna/nees-core-developer-preview" rel="noopener noreferrer"&gt;https://github.com/NEES-Anna/nees-core-developer-preview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Live sample app:&lt;br&gt;
&lt;a href="https://naina.nees.cloud" rel="noopener noreferrer"&gt;https://naina.nees.cloud&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>saas</category>
      <category>agents</category>
    </item>
    <item>
      <title>Looking for developers to test and review NEES Core Engine — a governed runtime layer for AI apps

I’m opening NEES Core Engine for developer feedback.

NEES Core Engine is a governed AI runtime layer that sits between an AI application and the model provi</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Thu, 14 May 2026 16:34:46 +0000</pubDate>
      <link>https://dev.to/anna2612/looking-for-developers-to-test-and-review-nees-core-engine-a-governed-runtime-layer-for-ai-1n0p</link>
      <guid>https://dev.to/anna2612/looking-for-developers-to-test-and-review-nees-core-engine-a-governed-runtime-layer-for-ai-1n0p</guid>
      <description></description>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
      <category>testing</category>
    </item>
    <item>
      <title>Looking for developers to test and review NEES Core Engine — a governed runtime layer for AI apps</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Thu, 14 May 2026 16:34:09 +0000</pubDate>
      <link>https://dev.to/anna2612/looking-for-developers-to-test-and-review-nees-core-engine-a-governed-runtime-layer-for-ai-apps-1lh6</link>
      <guid>https://dev.to/anna2612/looking-for-developers-to-test-and-review-nees-core-engine-a-governed-runtime-layer-for-ai-apps-1lh6</guid>
      <description>&lt;p&gt;I’m opening NEES Core Engine for developer feedback.&lt;/p&gt;

&lt;p&gt;NEES Core Engine is a governed AI runtime layer that sits between an AI application and the model provider.&lt;/p&gt;

&lt;p&gt;User → App → NEES Core Engine → Model Provider → Governed Response&lt;/p&gt;

&lt;p&gt;The goal is not to replace the model.&lt;/p&gt;

&lt;p&gt;The goal is to make AI product behavior more controllable, traceable, and reviewable.&lt;/p&gt;

&lt;p&gt;Why I’m building this&lt;/p&gt;

&lt;p&gt;Most AI product failures are not only model failures.&lt;/p&gt;

&lt;p&gt;In real AI workflows, the failure often comes from the system around the model:&lt;/p&gt;

&lt;p&gt;unclear role boundaries&lt;br&gt;
messy memory/context scope&lt;br&gt;
missing escalation paths&lt;br&gt;
weak permission boundaries&lt;br&gt;
no traceability&lt;br&gt;
no reviewable decision history&lt;br&gt;
behavior drift across sessions&lt;br&gt;
prompt fixes scattered across the product&lt;/p&gt;

&lt;p&gt;A model can pass a safety filter and still behave incorrectly for the product.&lt;/p&gt;

&lt;p&gt;That is the gap I’m trying to explore with NEES Core Engine.&lt;/p&gt;

&lt;p&gt;What NEES Core Engine focuses on&lt;/p&gt;

&lt;p&gt;NEES Core Engine is designed around runtime governance for AI products:&lt;/p&gt;

&lt;p&gt;behavior governance&lt;br&gt;
role consistency&lt;br&gt;
memory boundaries&lt;br&gt;
intent-aware policy decisions&lt;br&gt;
runtime trace IDs&lt;br&gt;
escalation/fallback visibility&lt;br&gt;
reviewable AI responses&lt;br&gt;
Who I’m looking for&lt;/p&gt;

&lt;p&gt;I’m looking for developers building or testing:&lt;/p&gt;

&lt;p&gt;AI agents&lt;br&gt;
customer support bots&lt;br&gt;
internal copilots&lt;br&gt;
workflow automation tools&lt;br&gt;
AI apps using memory&lt;br&gt;
AI apps using tools/actions&lt;br&gt;
products where role, tone, escalation, or traceability matter&lt;br&gt;
Developer preview&lt;/p&gt;

&lt;p&gt;GitHub repo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NEES-Anna/nees-core-developer-preview" rel="noopener noreferrer"&gt;https://github.com/NEES-Anna/nees-core-developer-preview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Live sample app:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://naina.nees.cloud" rel="noopener noreferrer"&gt;https://naina.nees.cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The preview includes:&lt;/p&gt;

&lt;p&gt;Python quickstart&lt;br&gt;
Node.js quickstart&lt;br&gt;
cURL example&lt;br&gt;
API reference&lt;br&gt;
governance flow docs&lt;br&gt;
API key request template&lt;br&gt;
developer feedback template&lt;br&gt;
What feedback would help most&lt;/p&gt;

&lt;p&gt;If you test it, I’d love feedback on:&lt;/p&gt;

&lt;p&gt;Is the quickstart clear?&lt;br&gt;
Does the governance flow make sense?&lt;br&gt;
Are trace IDs useful?&lt;br&gt;
Is the response metadata helpful?&lt;br&gt;
What fields would you need in production?&lt;br&gt;
Where does the runtime feel incomplete?&lt;br&gt;
What failure modes should NEES handle better?&lt;br&gt;
What would stop you from integrating this into a real workflow?&lt;/p&gt;

&lt;p&gt;This is an early developer preview, so I’m not looking for praise.&lt;/p&gt;

&lt;p&gt;I’m looking for honest technical feedback from builders.&lt;/p&gt;

&lt;p&gt;Even 15 minutes of testing would help shape the next version of NEES Core Engine.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>opensource</category>
      <category>discuss</category>
      <category>startup</category>
    </item>
    <item>
      <title>Is AI governance only about safety, or should it also control product behavior?</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Wed, 13 May 2026 18:06:01 +0000</pubDate>
      <link>https://dev.to/anna2612/is-ai-governance-only-aboutsafety-or-should-it-alsocontrol-product-behavior-30l6</link>
      <guid>https://dev.to/anna2612/is-ai-governance-only-aboutsafety-or-should-it-alsocontrol-product-behavior-30l6</guid>
      <description>&lt;p&gt;I’ve been researching the AI governance runtime category while building NEES Core Engine, and one thing became clearer to me:&lt;/p&gt;

&lt;p&gt;Most AI governance tools are designed around risk reduction.&lt;/p&gt;

&lt;p&gt;They help answer questions like:&lt;/p&gt;

&lt;p&gt;Is the output unsafe?&lt;br&gt;
Is there PII in the prompt?&lt;br&gt;
Is the model violating policy?&lt;br&gt;
Is the system compliant with internal or regulatory rules?&lt;/p&gt;

&lt;p&gt;That is important. But while building AI products, I noticed another failure mode:&lt;/p&gt;

&lt;p&gt;An AI can be “safe” and still be unreliable as a product.&lt;/p&gt;

&lt;p&gt;It can drift from its intended role.&lt;br&gt;
It can change tone across sessions.&lt;br&gt;
It can misuse memory or context.&lt;br&gt;
It can behave differently even when the product logic expects consistency.&lt;br&gt;
It can follow a prompt but break the actual user experience.&lt;/p&gt;

&lt;p&gt;That led me to a different framing:&lt;/p&gt;

&lt;p&gt;Traditional AI governance asks: “Is this response safe?”&lt;br&gt;
Behavioral governance asks: “Is this AI behaving the way the product intended?”&lt;/p&gt;

&lt;p&gt;This is the direction I’m exploring with NEES Core Engine — a governance runtime that sits between an application and the model provider, not only to filter harmful content, but to enforce things like:&lt;/p&gt;

&lt;p&gt;identity consistency&lt;br&gt;
memory boundaries&lt;br&gt;
intent-aware policy decisions&lt;br&gt;
runtime traceability&lt;br&gt;
product-defined behavior&lt;/p&gt;

&lt;p&gt;The difference I’m seeing is:&lt;/p&gt;

&lt;p&gt;Standard governance runtime: protect the company from AI risk.&lt;br&gt;
Behavioral governance runtime: protect the product from AI unpredictability.&lt;/p&gt;

&lt;p&gt;For example, in a support bot, safety filtering is not enough. The bot also needs to stay within its role, follow product logic, respect memory boundaries, and behave consistently across sessions.&lt;/p&gt;

&lt;p&gt;For AI agents, this becomes even more important because the system may use tools, access data, or make workflow decisions.&lt;/p&gt;

&lt;p&gt;I’m curious how other founders and AI builders think about this:&lt;/p&gt;

&lt;p&gt;When building AI products, do you see governance mostly as a compliance/safety layer — or do you also need a runtime layer that controls behavior, identity, memory, and intent?&lt;/p&gt;

&lt;p&gt;Would love feedback from anyone building agents, AI assistants, internal copilots, or customer-facing AI products.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>devops</category>
    </item>
    <item>
      <title>I Built an AI Governance Runtime Layer for Production AI Apps</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Sat, 09 May 2026 08:11:36 +0000</pubDate>
      <link>https://dev.to/anna2612/i-built-an-ai-governance-runtime-layer-for-production-ai-apps-28bi</link>
      <guid>https://dev.to/anna2612/i-built-an-ai-governance-runtime-layer-for-production-ai-apps-28bi</guid>
      <description>&lt;p&gt;Most AI apps today follow a very simple pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → App → LLM → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern works well for demos.&lt;/p&gt;

&lt;p&gt;It works for prototypes.&lt;br&gt;
It works for simple assistants.&lt;br&gt;
It works when the workflow is clean and the risk is low.&lt;/p&gt;

&lt;p&gt;But once AI starts moving into real products, the problem changes.&lt;/p&gt;

&lt;p&gt;The question is no longer only:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can the model generate a good answer?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The real production questions become:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What was the AI allowed to do?&lt;br&gt;
What context did it use?&lt;br&gt;
What memory was active?&lt;br&gt;
Which policy applied?&lt;br&gt;
Why did it respond this way?&lt;br&gt;
Can this interaction be reviewed later?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the problem I am trying to solve with &lt;strong&gt;NEES Core Engine&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  What is NEES Core Engine?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;NEES Core Engine&lt;/strong&gt; is a governed AI runtime layer for production AI applications.&lt;/p&gt;

&lt;p&gt;It sits between your application and the model provider.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  ↓
Application
  ↓
NEES Core Engine
  ↓
Governance Runtime
  ↓
Model Provider
  ↓
Governed Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal is not to build another chatbot.&lt;/p&gt;

&lt;p&gt;The goal is to give AI applications a runtime control layer for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;policy awareness&lt;/li&gt;
&lt;li&gt;identity consistency&lt;/li&gt;
&lt;li&gt;memory boundaries&lt;/li&gt;
&lt;li&gt;runtime modes&lt;/li&gt;
&lt;li&gt;traceability&lt;/li&gt;
&lt;li&gt;explainability metadata&lt;/li&gt;
&lt;li&gt;safer production behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;NEES helps AI apps become more controlled, traceable, and reviewable before the response reaches the user.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why prompts are not enough
&lt;/h2&gt;

&lt;p&gt;A prompt can guide behavior.&lt;/p&gt;

&lt;p&gt;But a prompt is not governance.&lt;/p&gt;

&lt;p&gt;A prompt cannot reliably answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which policy was active?&lt;/li&gt;
&lt;li&gt;What memory scope was allowed?&lt;/li&gt;
&lt;li&gt;What should happen if two instructions conflict?&lt;/li&gt;
&lt;li&gt;When should the AI escalate?&lt;/li&gt;
&lt;li&gt;What response path was used?&lt;/li&gt;
&lt;li&gt;How do we debug this response later?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most production AI problems do not happen because the model is completely useless.&lt;/p&gt;

&lt;p&gt;They happen because the system around the model is weak.&lt;/p&gt;

&lt;p&gt;The workflow is unclear.&lt;br&gt;
The context is messy.&lt;br&gt;
The memory boundary is undefined.&lt;br&gt;
The role is inconsistent.&lt;br&gt;
The decision path is not visible.&lt;/p&gt;

&lt;p&gt;So the model is forced to guess.&lt;/p&gt;

&lt;p&gt;That is where governance becomes necessary.&lt;/p&gt;


&lt;h2&gt;
  
  
  What NEES adds to the AI stack
&lt;/h2&gt;

&lt;p&gt;A direct model call usually gives you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt → Model → Text Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A governed NEES call gives you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request
  ↓
Runtime governance
  ↓
Model response
  ↓
Governance metadata
  ↓
Traceable output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means the response is not only text.&lt;/p&gt;

&lt;p&gt;It can also carry metadata such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reply"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Governed assistant response..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trace_xxxxx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"engine_source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"core_engine"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"governance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allowed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode_used"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"supportive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"policy_applied"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"memory_scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"session"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact response fields may evolve during the developer preview, but the principle is the same:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every AI response should be easier to understand, debug, and review.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  A simple example
&lt;/h2&gt;

&lt;p&gt;Here is a basic Python request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.nees.cloud/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_NEES_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain why AI apps need runtime governance in simple terms.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;supportive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;demo-session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is still a simple API call.&lt;/p&gt;

&lt;p&gt;But instead of treating the model response as a black box, NEES routes the request through a governed runtime layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why traceability matters
&lt;/h2&gt;

&lt;p&gt;When an AI response goes wrong in production, teams need more than:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The model said this.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what request came in&lt;/li&gt;
&lt;li&gt;what mode was active&lt;/li&gt;
&lt;li&gt;what policy applied&lt;/li&gt;
&lt;li&gt;what memory scope was used&lt;/li&gt;
&lt;li&gt;what provider/model path handled the request&lt;/li&gt;
&lt;li&gt;whether the response was allowed, modified, or blocked&lt;/li&gt;
&lt;li&gt;how the interaction can be reviewed later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why trace IDs matter.&lt;/p&gt;

&lt;p&gt;A trace ID acts like a reference point for debugging and review.&lt;/p&gt;

&lt;p&gt;Without traceability, AI debugging becomes guesswork.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory boundaries matter too
&lt;/h2&gt;

&lt;p&gt;Memory is powerful.&lt;/p&gt;

&lt;p&gt;But uncontrolled memory can create serious problems.&lt;/p&gt;

&lt;p&gt;If every past interaction can influence every future response, the system becomes harder to reason about.&lt;/p&gt;

&lt;p&gt;So memory should not be treated as unlimited context.&lt;/p&gt;

&lt;p&gt;It should be governed.&lt;/p&gt;

&lt;p&gt;A production AI system should be able to reason about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what belongs only to the current session&lt;/li&gt;
&lt;li&gt;what can be reused across sessions&lt;/li&gt;
&lt;li&gt;what requires explicit consent&lt;/li&gt;
&lt;li&gt;what should never influence a response&lt;/li&gt;
&lt;li&gt;when memory usage should be visible or traceable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not simply:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give the AI more memory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The goal is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Control when memory is used, why it is used, and how that usage can be reviewed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Runtime governance vs another AI agent
&lt;/h2&gt;

&lt;p&gt;I do not think the answer to every AI problem is “add another agent.”&lt;/p&gt;

&lt;p&gt;Sometimes the missing layer is not another AI.&lt;/p&gt;

&lt;p&gt;Sometimes the missing layer is control.&lt;/p&gt;

&lt;p&gt;AI agents become useful when the system around them is designed properly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clear workflow boundaries&lt;/li&gt;
&lt;li&gt;role permissions&lt;/li&gt;
&lt;li&gt;escalation rules&lt;/li&gt;
&lt;li&gt;memory scope&lt;/li&gt;
&lt;li&gt;policy checks&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;fallback behavior&lt;/li&gt;
&lt;li&gt;human review when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NEES is focused on that runtime layer.&lt;/p&gt;

&lt;p&gt;It is not trying to replace the model.&lt;/p&gt;

&lt;p&gt;It is trying to make AI behavior easier to govern before it reaches users.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where this can be useful
&lt;/h2&gt;

&lt;p&gt;NEES Core Engine can be useful for teams building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI assistants&lt;/li&gt;
&lt;li&gt;AI agents&lt;/li&gt;
&lt;li&gt;customer support bots&lt;/li&gt;
&lt;li&gt;education apps&lt;/li&gt;
&lt;li&gt;workflow automation&lt;/li&gt;
&lt;li&gt;internal company copilots&lt;/li&gt;
&lt;li&gt;AI content pipelines&lt;/li&gt;
&lt;li&gt;production AI tools that need auditability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common thread is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If AI behavior affects real users, real workflows, or real decisions, it should be controlled and traceable.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Developer preview is now open
&lt;/h2&gt;

&lt;p&gt;I recently opened a public developer preview repo for NEES Core Engine.&lt;/p&gt;

&lt;p&gt;The repo includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python quickstart&lt;/li&gt;
&lt;li&gt;Node.js quickstart&lt;/li&gt;
&lt;li&gt;cURL and PowerShell examples&lt;/li&gt;
&lt;li&gt;API reference&lt;/li&gt;
&lt;li&gt;governance flow documentation&lt;/li&gt;
&lt;li&gt;15-minute integration guide&lt;/li&gt;
&lt;li&gt;API key request template&lt;/li&gt;
&lt;li&gt;developer feedback template&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Developer preview repo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NEES-Anna/nees-core-developer-preview" rel="noopener noreferrer"&gt;https://github.com/NEES-Anna/nees-core-developer-preview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is also a live sample app connected to the governed runtime:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://naina.nees.cloud" rel="noopener noreferrer"&gt;https://naina.nees.cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The sample app is useful for seeing the governed response flow in a real interface.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I am looking for
&lt;/h2&gt;

&lt;p&gt;This is still early.&lt;/p&gt;

&lt;p&gt;I am not looking for generic traffic.&lt;/p&gt;

&lt;p&gt;I am looking for honest feedback from developers, AI builders, founders, and teams working with production AI systems.&lt;/p&gt;

&lt;p&gt;I would especially like feedback on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the API approach clear?&lt;/li&gt;
&lt;li&gt;Does the governance metadata feel useful?&lt;/li&gt;
&lt;li&gt;Would trace IDs help you debug AI behavior?&lt;/li&gt;
&lt;li&gt;How would you expect memory boundaries to work?&lt;/li&gt;
&lt;li&gt;Would this fit better as a hosted API, SDK, or both?&lt;/li&gt;
&lt;li&gt;What would you need before using this in a real product?&lt;/li&gt;
&lt;li&gt;What integration docs should come next?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first goal is not to make the system complex.&lt;/p&gt;

&lt;p&gt;The first goal is to make the first 15 minutes useful.&lt;/p&gt;

&lt;p&gt;A developer should be able to send one governed request and immediately understand:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is different from a direct model call because I can see how the response was governed.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;AI is moving from demos into production.&lt;/p&gt;

&lt;p&gt;That shift changes the infrastructure requirement.&lt;/p&gt;

&lt;p&gt;In demos, a good answer is enough.&lt;/p&gt;

&lt;p&gt;In production, teams need control.&lt;/p&gt;

&lt;p&gt;They need to know what the AI was allowed to do, what context it used, what policy applied, and how the decision can be reviewed later.&lt;/p&gt;

&lt;p&gt;That is the layer I am building with NEES Core Engine.&lt;/p&gt;

&lt;p&gt;Not another chatbot.&lt;/p&gt;

&lt;p&gt;A governance runtime for production AI.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>startup</category>
    </item>
    <item>
      <title>Why I’m building a Windows-first emotional AI assistant (lessons so far)</title>
      <dc:creator>Anna Jambhulkar</dc:creator>
      <pubDate>Mon, 22 Dec 2025 13:21:10 +0000</pubDate>
      <link>https://dev.to/anna2612/why-im-building-a-windows-first-emotional-ai-assistant-lessons-so-far-1iii</link>
      <guid>https://dev.to/anna2612/why-im-building-a-windows-first-emotional-ai-assistant-lessons-so-far-1iii</guid>
      <description>&lt;p&gt;Most AI products today are optimized for speed, accuracy, and scale.&lt;/p&gt;

&lt;p&gt;And that makes sense.&lt;/p&gt;

&lt;p&gt;But while using AI tools daily, I kept running into the same feeling:&lt;br&gt;
every interaction felt stateless. Every session started from zero.&lt;br&gt;
No memory. No continuity. No sense of knowing the user.&lt;/p&gt;

&lt;p&gt;That’s where my curiosity started.&lt;/p&gt;

&lt;p&gt;The problem I noticed&lt;/p&gt;

&lt;p&gt;Modern AI assistants are impressive, but they behave like strangers who forget you every day.&lt;/p&gt;

&lt;p&gt;You explain your preferences again.&lt;br&gt;
You restate context again.&lt;br&gt;
You rebuild workflows again.&lt;/p&gt;

&lt;p&gt;From a technical perspective, this is fine.&lt;br&gt;
From a human perspective, it feels broken.&lt;/p&gt;

&lt;p&gt;Humans don’t work in isolated prompts — we work in continuity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Windows-first (and not cloud-first)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One decision I made early was to build this as a Windows-first assistant, not a browser tab or a purely cloud-based tool.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because a personal computer is still the most intimate computing device we own:&lt;/p&gt;

&lt;p&gt;It holds our files&lt;/p&gt;

&lt;p&gt;It reflects our workflows&lt;/p&gt;

&lt;p&gt;It stays with us for years&lt;/p&gt;

&lt;p&gt;Building locally (or at least desktop-native) allows:&lt;/p&gt;

&lt;p&gt;Better context awareness&lt;/p&gt;

&lt;p&gt;Stronger privacy boundaries&lt;/p&gt;

&lt;p&gt;Tighter integration with daily work&lt;/p&gt;

&lt;p&gt;Instead of AI being “somewhere on the internet”, it becomes present.&lt;/p&gt;

&lt;p&gt;Emotional AI ≠ pretending to be human&lt;/p&gt;

&lt;p&gt;A common misconception:&lt;br&gt;
emotional AI means making the assistant sound emotional.&lt;/p&gt;

&lt;p&gt;That’s not what I’m exploring.&lt;/p&gt;

&lt;p&gt;For me, emotional AI is about:&lt;/p&gt;

&lt;p&gt;Remembering preferences&lt;/p&gt;

&lt;p&gt;Maintaining interaction history&lt;/p&gt;

&lt;p&gt;Adapting tone and behavior over time&lt;/p&gt;

&lt;p&gt;It’s not about fake empathy.&lt;br&gt;
It’s about continuity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I’ve learned so far (the hard parts)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Memory is expensive — technically and ethically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Storing memory isn’t just a database problem.&lt;br&gt;
You need to decide:&lt;/p&gt;

&lt;p&gt;What’s worth remembering?&lt;/p&gt;

&lt;p&gt;What should be forgotten?&lt;/p&gt;

&lt;p&gt;Who controls that memory?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;“Personal” quickly becomes “creepy” if done wrong&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There’s a very thin line between helpful continuity and overreach.&lt;br&gt;
Designing that boundary is more important than model choice.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Developers underestimate emotion in tools&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Many devs (myself included) initially think users only care about features.&lt;br&gt;
In reality, how a tool makes you feel over time strongly affects retention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I’m sharing this early&lt;/strong&gt;&lt;br&gt;
This project is still in a tech-trial stage.&lt;br&gt;
I’m intentionally sharing before everything is “perfect”.&lt;/p&gt;

&lt;p&gt;Because the most valuable insights so far haven’t come from metrics —&lt;br&gt;
they’ve come from conversations.&lt;/p&gt;

&lt;p&gt;A question for builders here&lt;/p&gt;

&lt;p&gt;When you think about the tools you use daily:&lt;/p&gt;

&lt;p&gt;Do you value memory and continuity?&lt;/p&gt;

&lt;p&gt;Or do you prefer tools to stay stateless and predictable?&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Where do you personally draw the line?&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
I’d love to learn from real experiences, not just theory.&lt;/p&gt;

&lt;p&gt;Thanks for reading 🙏&lt;/p&gt;

</description>
      <category>ai</category>
      <category>saas</category>
      <category>productivity</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
