<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: KRISHNA KISHOR TIRUPATI</title>
    <description>The latest articles on DEV Community by KRISHNA KISHOR TIRUPATI (@ktirupati).</description>
    <link>https://dev.to/ktirupati</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898223%2F924d18d8-e8a5-4915-bb5c-1367a3a911e8.jpeg</url>
      <title>DEV Community: KRISHNA KISHOR TIRUPATI</title>
      <link>https://dev.to/ktirupati</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ktirupati"/>
    <language>en</language>
    <item>
      <title>Build a Policy-Aware AI Gateway in Python: Data Protection + Policy Enforcement with policyaware</title>
      <dc:creator>KRISHNA KISHOR TIRUPATI</dc:creator>
      <pubDate>Mon, 18 May 2026 04:33:30 +0000</pubDate>
      <link>https://dev.to/ktirupati/build-a-policy-aware-ai-gateway-in-python-data-protection-policy-enforcement-with-policyaware-462h</link>
      <guid>https://dev.to/ktirupati/build-a-policy-aware-ai-gateway-in-python-data-protection-policy-enforcement-with-policyaware-462h</guid>
      <description>&lt;p&gt;Most AI apps ship without any real governance layer. Prompts flow raw to models, sensitive data ends up in logs, and nobody finds out until a compliance audit or a breach. I built &lt;code&gt;policyaware&lt;/code&gt; to fix that — a Python-first package that gives you &lt;strong&gt;data protection&lt;/strong&gt; and &lt;strong&gt;policy enforcement&lt;/strong&gt; in front of any AI system.&lt;/p&gt;

&lt;p&gt;This article is a hands-on technical walkthrough. Every section has working code. By the end you will have a pattern you can wire into any AI gateway or agent pipeline today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;policyaware
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ktirupati/policyaware" rel="noopener noreferrer"&gt;https://github.com/ktirupati/policyaware&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Wiki:&lt;/strong&gt; &lt;a href="https://github.com/ktirupati/policyaware/wiki" rel="noopener noreferrer"&gt;https://github.com/ktirupati/policyaware/wiki&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Part 1 — Data Protection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What the engine detects
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;DataProtectionEngine&lt;/code&gt; scans any string and returns a structured &lt;code&gt;DataFindings&lt;/code&gt; object. It classifies content into three buckets:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bucket&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PII&lt;/td&gt;
&lt;td&gt;email, phone, SSN, credit card&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PHI&lt;/td&gt;
&lt;td&gt;medical record, patient ID, diagnosis, medication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets&lt;/td&gt;
&lt;td&gt;API keys, bearer tokens, private keys&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Inspecting a prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;policyaware&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataProtectionEngine&lt;/span&gt;

&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi, I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m Jane. Reach me at jane@example.com or 212-555-7890.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataProtectionEngine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;findings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains_pii&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# True
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains_phi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# False
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains_secrets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# False
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains_sensitive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# True  (aggregate flag)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# ['email', 'phone']
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redactions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# 2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DataFindings field reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contains_pii&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;bool&lt;/td&gt;
&lt;td&gt;email, phone, SSN, credit card detected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contains_phi&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;bool&lt;/td&gt;
&lt;td&gt;medical record, diagnosis, medication detected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contains_secrets&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;bool&lt;/td&gt;
&lt;td&gt;API key, bearer token, private key detected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;contains_sensitive&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;bool&lt;/td&gt;
&lt;td&gt;True if any of the above is True&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;categories&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list&lt;/td&gt;
&lt;td&gt;e.g. &lt;code&gt;['email', 'phone', 'ssn']&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;redactions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int&lt;/td&gt;
&lt;td&gt;Total number of matches found&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;redacted_text&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;str&lt;/td&gt;
&lt;td&gt;Sanitised text returned by &lt;code&gt;.redact()&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Part 2 — Policy Enforcement
&lt;/h2&gt;

&lt;p&gt;Data protection tells you &lt;em&gt;what&lt;/em&gt; is in the request. Policy enforcement tells you &lt;em&gt;what to do about it&lt;/em&gt;. The &lt;code&gt;PolicyEngine&lt;/code&gt; loads a YAML file and evaluates every request against your rules, returning a structured &lt;code&gt;PolicyDecision&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The four decision outcomes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;allow&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Request passes through, apply any transforms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Request is blocked outright&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;conditional_allow&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Passes but triggers follow-up checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;require_approval&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Routes to a human-in-the-loop flow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;The engine is &lt;strong&gt;deny-by-default&lt;/strong&gt;. If no rule explicitly grants access, the request is blocked. No silent pass-throughs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Writing your first policy YAML
&lt;/h3&gt;

&lt;p&gt;Rules reference &lt;code&gt;DataFindings&lt;/code&gt; fields directly via the &lt;code&gt;data&lt;/code&gt; root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# support_policy.yaml&lt;/span&gt;
&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;support_policy&lt;/span&gt;
&lt;span class="na"&gt;schema_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.2"&lt;/span&gt;
&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deny&lt;/span&gt;

&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

  &lt;span class="c1"&gt;# Rule 1: Block anything containing secrets (API keys, tokens)&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deny_secret_leakage&lt;/span&gt;
    &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deny&lt;/span&gt;
    &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;data.contains_secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="c1"&gt;# Rule 2: Redact PII for standard users, but not for compliance officers&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redact_pii_standard_users&lt;/span&gt;
    &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;transform&lt;/span&gt;
    &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redact&lt;/span&gt;
    &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;data.contains_pii&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;user.role_not_in&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;privacy_admin&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;compliance_officer&lt;/span&gt;

  &lt;span class="c1"&gt;# Rule 3: Allow support agents in US for low/medium risk requests&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allow_support_agents&lt;/span&gt;
    &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allow&lt;/span&gt;
    &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;user.role_in&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;support_agent&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;support_manager&lt;/span&gt;
      &lt;span class="na"&gt;request.region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us&lt;/span&gt;
      &lt;span class="na"&gt;risk.tier_in&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;low&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;medium&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Enforcing the policy at runtime
&lt;/h3&gt;

&lt;p&gt;Load the YAML, build a &lt;code&gt;GatewayRequest&lt;/code&gt;, inspect the prompt, then call &lt;code&gt;decide&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;policyaware&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataProtectionEngine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GatewayRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PolicyEngine&lt;/span&gt;

&lt;span class="c1"&gt;# Load policy from YAML file
&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PolicyEngine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support_policy.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Build the request context
&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GatewayRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support-copilot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;u_001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email jane@example.com, urgent!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: inspect the prompt
&lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DataProtectionEngine&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;inspect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: evaluate policy
&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: act on the decision
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# 'allow' / 'deny' / 'conditional_allow' / 'require_approval'
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# ['redact']
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matched_rules&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# ['redact_pii_standard_users', 'allow_support_agents']
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;violated_rules&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# []
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# Human-readable explanation
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason_codes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# Machine-readable codes for logging
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;risk_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# Numeric risk score
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;risk_tier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# 'low' / 'medium' / 'high' / 'critical'
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remediation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# Suggested fix if blocked
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PolicyDecision field reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;decision&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;enum&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;allow&lt;/code&gt;, &lt;code&gt;deny&lt;/code&gt;, &lt;code&gt;conditional_allow&lt;/code&gt;, &lt;code&gt;require_approval&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;actions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list&lt;/td&gt;
&lt;td&gt;Transforms to apply e.g. &lt;code&gt;['redact']&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;matched_rules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list&lt;/td&gt;
&lt;td&gt;Rules that matched the request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;violated_rules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list&lt;/td&gt;
&lt;td&gt;Rules that were violated (for audit logs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reason&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;str&lt;/td&gt;
&lt;td&gt;Human-readable explanation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reason_codes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;list&lt;/td&gt;
&lt;td&gt;Machine-readable codes for dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;risk_score&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;Numeric risk score&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;risk_tier&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;str&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;low&lt;/code&gt;, &lt;code&gt;medium&lt;/code&gt;, &lt;code&gt;high&lt;/code&gt;, &lt;code&gt;critical&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;remediation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;str&lt;/td&gt;
&lt;td&gt;Suggested fix when request is blocked&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Policy Context Roots
&lt;/h2&gt;

&lt;p&gt;Inside every &lt;code&gt;when&lt;/code&gt; clause you can reference these roots:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Root&lt;/th&gt;
&lt;th&gt;Example usage&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tenant&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tenant: acme&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Customer or team identifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;app&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;app: support-copilot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Calling application or service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;user.role_in: [support_agent]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Role, ID, department attributes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;request.region: us&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Region, task type, autonomy level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;data.contains_pii: true&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Output from &lt;code&gt;DataProtectionEngine&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;risk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;risk.tier_in: [low, medium]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Risk score and tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ml.prompt_injection.detected: true&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Optional ML classifier signals&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Validate Policies Before Production
&lt;/h2&gt;

&lt;p&gt;Ship broken policies and you get silent misses or unintended blocks. &lt;code&gt;policyaware&lt;/code&gt; ships a schema validator and CLI to catch issues early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python validator:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;policyaware&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PolicySchemaValidator&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support_policy.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;safe_load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nc"&gt;PolicySchemaValidator&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# raises on schema errors
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CLI commands:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Validate the YAML schema&lt;/span&gt;
policyaware policy validate support_policy.yaml

&lt;span class="c"&gt;# Explain how a specific request flows through your rules&lt;/span&gt;
policyaware policy explain &lt;span class="nt"&gt;--request&lt;/span&gt; sample_request.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;explain&lt;/code&gt; command is especially useful in CI/CD pipelines — you can run policy checks against a suite of sample requests before merging.&lt;/p&gt;




&lt;h2&gt;
  
  
  Optional: ML-Assisted PII Detection with Presidio
&lt;/h2&gt;

&lt;p&gt;Regex-based rules miss things like names and addresses. For those, &lt;code&gt;policyaware&lt;/code&gt; supports an optional &lt;a href="https://microsoft.github.io/presidio/" rel="noopener noreferrer"&gt;Microsoft Presidio&lt;/a&gt; integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"policyaware[presidio]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;policyaware&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PresidioPIIClassifier&lt;/span&gt;

&lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PresidioPIIClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;assessment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Jane Doe lives at 120 Main St and her phone is 212-555-7890.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;assessment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c1"&gt;# Returns detected entities with type, value, and confidence score
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Presidio findings feed back into the same &lt;code&gt;data&lt;/code&gt; and &lt;code&gt;ml&lt;/code&gt; roots in your YAML, giving you deterministic + ML detection in one framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR — What You Get in One Package
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;How&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Detect PII, PHI, Secrets&lt;/td&gt;
&lt;td&gt;&lt;code&gt;DataProtectionEngine().inspect(text)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redact sensitive content&lt;/td&gt;
&lt;td&gt;&lt;code&gt;DataProtectionEngine().redact(text)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enforce access policies via YAML&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PolicyEngine.from_file("policy.yaml")&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rich audit-ready decisions&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PolicyDecision&lt;/code&gt; with reason, risk, remediation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ML-assisted detection&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PresidioPIIClassifier&lt;/code&gt; (optional extra)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Validate policies before shipping&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PolicySchemaValidator&lt;/code&gt; + CLI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Get Started Now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;!&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;policyaware
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the fastest path to seeing value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the package&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;DataProtectionEngine().inspect()&lt;/code&gt; on one real prompt from your app&lt;/li&gt;
&lt;li&gt;Write a 3-rule YAML that reflects your actual governance needs&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;policy.decide(request, findings)&lt;/code&gt; and log the full &lt;code&gt;PolicyDecision&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That four-step experiment is enough to understand whether &lt;code&gt;policyaware&lt;/code&gt; fits your stack.&lt;/p&gt;

&lt;p&gt;I am the author and sole maintainer of this package. I built it because every AI project I worked on had the same gap — no structured layer between raw user input and the model. If you run into anything unexpected, have a governance pattern not covered yet, or want to contribute, I want to hear from you.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ktirupati/policyaware" rel="noopener noreferrer"&gt;https://github.com/ktirupati/policyaware&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wiki &amp;amp; Docs:&lt;/strong&gt; &lt;a href="https://github.com/ktirupati/policyaware/wiki" rel="noopener noreferrer"&gt;https://github.com/ktirupati/policyaware/wiki&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this was useful, drop a like, share it with your team, and star the repo. Every bit of feedback helps make &lt;code&gt;policyaware&lt;/code&gt; better for everyone building serious AI systems in Python.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built an AI Agent That Remembers My Entire Codebase (So I Don't Have To)</title>
      <dc:creator>KRISHNA KISHOR TIRUPATI</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:34:01 +0000</pubDate>
      <link>https://dev.to/ktirupati/i-built-an-ai-agent-that-remembers-my-entire-codebase-so-i-dont-have-to-2h32</link>
      <guid>https://dev.to/ktirupati/i-built-an-ai-agent-that-remembers-my-entire-codebase-so-i-dont-have-to-2h32</guid>
      <description>&lt;p&gt;Ever spent 20 minutes digging through a legacy module just to remember how a specific utility function handles null pointers? We've all been there. Modern codebases are growing at a rate that outpaces human memory. That's why I decided to build a "Second Brain" for my development workflow: a Retrieval-Augmented Generation (RAG) based AI Agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: Context Switching is a Productivity Killer
&lt;/h3&gt;

&lt;p&gt;As developers, we spend more time reading code than writing it. When you're juggling microservices, custom hooks, and complex database schemas, the cognitive load becomes immense. I wanted something that didn't just "guess" based on general training data (looking at you, vanilla GPT-4), but actually &lt;strong&gt;knew&lt;/strong&gt; my specific implementation details.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architecture: How It Works
&lt;/h3&gt;

&lt;p&gt;The core of this system is a RAG pipeline optimized for source code. Here’s the high-level flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion:&lt;/strong&gt; A Python script crawls the repository, ignoring files in &lt;code&gt;.gitignore&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parsing:&lt;/strong&gt; It breaks the code into logical chunks (functions, classes, or modules).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding:&lt;/strong&gt; These chunks are converted into vector representations using OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; The vectors are stored in a Pinecone database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval:&lt;/strong&gt; When I ask a question, the agent finds the most relevant code snippets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning:&lt;/strong&gt; An LLM (GPT-4o) uses that retrieved context to provide a precise answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Show Me the Code!
&lt;/h3&gt;

&lt;p&gt;Here is a simplified version of the ingestion logic using &lt;code&gt;LangChain&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GenericLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders.parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LanguageParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_text_splitters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Language&lt;/span&gt;

&lt;span class="c1"&gt;# Load your local codebase
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GenericLoader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./my-awesome-project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**/*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;suffixes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.js&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;LanguageParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Language&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PYTHON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parser_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Split and Embed (Simplified)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;

&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Chroma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This is a Game Changer
&lt;/h3&gt;

&lt;p&gt;Since integrating this into my local CLI, I’ve noticed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant Onboarding:&lt;/strong&gt; I can point it at a new library and ask "How is authentication handled?" and get a breakdown in seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better Debugging:&lt;/strong&gt; I can paste an error trace and ask "Which part of our business logic could cause this?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency:&lt;/strong&gt; It helps ensure I'm using existing patterns instead of reinventing the wheel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Building an AI agent that remembers your codebase isn't about replacing the developer; it's about augmenting them. It removes the "grunt work" of searching and lets you focus on architectural decisions and problem-solving.&lt;/p&gt;

&lt;p&gt;Are you using any custom AI tools in your workflow? Let's discuss in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>programming</category>
    </item>
    <item>
      <title>Vibe Coding in 2026: How AI Tools Like Cursor, Replit, Claude, and GitHub Copilot Are Changing the Way We Build Software</title>
      <dc:creator>KRISHNA KISHOR TIRUPATI</dc:creator>
      <pubDate>Mon, 27 Apr 2026 22:22:35 +0000</pubDate>
      <link>https://dev.to/ktirupati/vibe-coding-in-2026-how-ai-tools-like-cursor-replit-claude-and-github-copilot-are-changing-the-3c05</link>
      <guid>https://dev.to/ktirupati/vibe-coding-in-2026-how-ai-tools-like-cursor-replit-claude-and-github-copilot-are-changing-the-3c05</guid>
      <description>&lt;p&gt;Remember when writing code meant typing every line, every bracket, every semicolon? That world is fading fast. In 2026, we are living in what many call the era of vibe coding, where describing what you want in plain English can get you most of the way to working code.&lt;/p&gt;

&lt;p&gt;I have been building with these tools every day, and the shift is real. This is not about replacing developers. It is about changing how we work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Vibe Coding
&lt;/h2&gt;

&lt;p&gt;Vibe coding means you describe your intent in natural language, and the AI writes the implementation. You focus on the what and why, while the AI handles the how.&lt;/p&gt;

&lt;p&gt;Instead of writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;calculate_fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;calculate_fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You tell Cursor or Claude: Create a function that calculates the nth Fibonacci number using recursion.&lt;/p&gt;

&lt;p&gt;The AI writes it, you review it, you ship it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Main Players in 2026
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GitHub Copilot
&lt;/h3&gt;

&lt;p&gt;Copilot lives inside your editor. You type a comment, it suggests the next few lines. You accept, modify, or reject.&lt;/p&gt;

&lt;p&gt;It works in VS Code, JetBrains, Neovim. It is the most widely adopted tool because it fits into existing workflows without forcing you to change editors.&lt;/p&gt;

&lt;p&gt;Example workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Function to fetch user data from API and cache it&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copilot suggests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchUserData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`user-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/api/users/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userData&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;userData&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You review it. Maybe you change the cache TTL. Maybe you add error handling. But the structure is there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;Cursor is an AI-first editor built on VS Code. It understands your entire codebase, not just the file you are editing.&lt;/p&gt;

&lt;p&gt;You can ask it: Find everywhere we are making API calls without proper error handling.&lt;/p&gt;

&lt;p&gt;It scans your repo, shows you every instance, and can fix them all at once.&lt;/p&gt;

&lt;p&gt;Cursor has two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat mode for asking questions&lt;/li&gt;
&lt;li&gt;Composer mode for making changes across multiple files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use Cursor when I need to refactor or when I am working on a feature that touches many files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude and ChatGPT
&lt;/h3&gt;

&lt;p&gt;These are not IDEs. They are reasoning engines.&lt;/p&gt;

&lt;p&gt;You paste your code, describe the problem, and they help you debug, refactor, or architect.&lt;/p&gt;

&lt;p&gt;They excel at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explaining complex code&lt;/li&gt;
&lt;li&gt;Suggesting architectural improvements&lt;/li&gt;
&lt;li&gt;Writing test cases&lt;/li&gt;
&lt;li&gt;Converting code between languages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
You: This Python script is slow when processing large CSV files. How can I optimize it?&lt;/p&gt;

&lt;p&gt;Claude analyzes your code and suggests using pandas chunking, multiprocessing, or switching to Polars.&lt;/p&gt;
&lt;h3&gt;
  
  
  Replit
&lt;/h3&gt;

&lt;p&gt;Replit is a browser-based IDE with AI built in. You can go from idea to deployed app without leaving the browser.&lt;/p&gt;

&lt;p&gt;It is especially good for prototypes, learning, and small projects where you do not want to set up a local environment.&lt;/p&gt;
&lt;h2&gt;
  
  
  Real Architecture Example
&lt;/h2&gt;

&lt;p&gt;Let me show you how I built a real feature using vibe coding.&lt;/p&gt;

&lt;p&gt;Task: Add a rate limiter to our API endpoints.&lt;/p&gt;

&lt;p&gt;Here is how the conversation with Cursor went:&lt;/p&gt;

&lt;p&gt;Me: I need to add rate limiting to all our Express routes. Use Redis for storage. Limit to 100 requests per minute per IP.&lt;/p&gt;

&lt;p&gt;Cursor generated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;redis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`ratelimit:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;incr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expire&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Too many requests&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rateLimiter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I asked: Add logging for when users hit the limit.&lt;/p&gt;

&lt;p&gt;It updated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Rate limit exceeded for IP: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Too many requests&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Final step: Apply this to all routes.&lt;/p&gt;

&lt;p&gt;Cursor modified the main app file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./middleware/rateLimiter&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rateLimiter&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total time: 3 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture of AI Coding Tools
&lt;/h2&gt;

&lt;p&gt;Here is how these systems work under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│                    Your Editor/IDE                      │
│  (VS Code, Cursor, Replit)                             │
└────────────┬────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────┐
│              AI Coding Assistant Plugin                 │
│  • Captures your code context                          │
│  • Sends prompts + context to AI                       │
│  • Receives suggestions                                │
└────────────┬────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────┐
│                   AI Model Layer                        │
│  (GPT-4, Claude, Codex)                                │
│  • Processes code context                              │
│  • Generates suggestions                               │
│  • Returns formatted code                              │
└────────────┬────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────┐
│                  Your Codebase                          │
│  • Changes applied                                      │
│  • Context updated                                      │
│  • Ready for next iteration                            │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key is context. The more the AI knows about your project, the better its suggestions.&lt;/p&gt;

&lt;p&gt;Cursor indexes your entire repo. GitHub Copilot looks at open files and recent edits. Claude works with what you paste.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow Example: Building a REST API
&lt;/h2&gt;

&lt;p&gt;Let me walk through building a simple REST API for a task manager using vibe coding.&lt;/p&gt;

&lt;p&gt;Step 1: Define the structure&lt;/p&gt;

&lt;p&gt;Me to Cursor: Create an Express API with routes for creating, reading, updating, and deleting tasks. Use MongoDB for storage.&lt;/p&gt;

&lt;p&gt;Cursor generates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mongoose&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mongoose&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;taskSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;mongoose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;mongoose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Task&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;taskSchema&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tasks/:id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findByIdAndUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;new&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tasks/:id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findByIdAndDelete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Task deleted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;mongoose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mongodb://localhost/taskmanager&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 2: Add error handling&lt;/p&gt;

&lt;p&gt;Me: Add try-catch blocks and proper error responses.&lt;/p&gt;

&lt;p&gt;Cursor updates all routes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 3: Add validation&lt;/p&gt;

&lt;p&gt;Me: Validate that title is required and completed defaults to false.&lt;/p&gt;

&lt;p&gt;Cursor modifies the schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;taskSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;mongoose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total development time: under 10 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Flow
&lt;/h2&gt;

&lt;p&gt;Here is how data moves through a vibe coding session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer Intent
      |
      v
  Natural Language Prompt
      |
      v
  AI Model (with code context)
      |
      v
  Generated Code Suggestion
      |
      v
  Developer Review &amp;amp; Edit
      |
      v
  Final Implementation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Works Best
&lt;/h2&gt;

&lt;p&gt;After months of daily use, here is what I have learned:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Be specific in your prompts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bad: Make this faster&lt;/li&gt;
&lt;li&gt;Good: Optimize this loop using a hash map instead of nested iteration&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give context&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instead of: Write a login function&lt;/li&gt;
&lt;li&gt;Say: Write a login function that checks credentials against our PostgreSQL users table, returns a JWT token, and logs failed attempts&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Iterate in small steps&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not ask for an entire feature at once&lt;/li&gt;
&lt;li&gt;Build piece by piece, testing as you go&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Review everything&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI makes mistakes&lt;/li&gt;
&lt;li&gt;It might use deprecated methods&lt;/li&gt;
&lt;li&gt;It might miss edge cases&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Limits
&lt;/h2&gt;

&lt;p&gt;Vibe coding is not magic. It struggles with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex business logic that requires domain expertise&lt;/li&gt;
&lt;li&gt;Performance optimization for specialized use cases&lt;/li&gt;
&lt;li&gt;Architectural decisions that involve tradeoffs&lt;/li&gt;
&lt;li&gt;Debugging production issues that need deep system knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You still need to understand what the code does. You still need to think like an engineer.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Setup
&lt;/h2&gt;

&lt;p&gt;Here is my current workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cursor for feature development and refactoring&lt;/li&gt;
&lt;li&gt;GitHub Copilot for autocomplete while editing&lt;/li&gt;
&lt;li&gt;Claude for architecture discussions and code review&lt;/li&gt;
&lt;li&gt;Replit for quick prototypes and experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I spend less time typing, more time thinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Vibe coding is not replacing developers. It is changing what we focus on.&lt;/p&gt;

&lt;p&gt;Instead of remembering syntax, we think about architecture.&lt;br&gt;
Instead of writing boilerplate, we design systems.&lt;br&gt;
Instead of debugging typos, we solve real problems.&lt;/p&gt;

&lt;p&gt;The tools are here. The question is: are you using them?&lt;/p&gt;

&lt;p&gt;What has your experience been with these AI coding tools? Are you skeptical, excited, somewhere in between? Let me know in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>github</category>
      <category>programming</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Designing and Deploying Agentic AI Systems in Production Using Azure OpenAI</title>
      <dc:creator>KRISHNA KISHOR TIRUPATI</dc:creator>
      <pubDate>Sun, 26 Apr 2026 02:27:32 +0000</pubDate>
      <link>https://dev.to/ktirupati/designing-and-deploying-agentic-ai-systems-in-production-using-azure-openai-1iaj</link>
      <guid>https://dev.to/ktirupati/designing-and-deploying-agentic-ai-systems-in-production-using-azure-openai-1iaj</guid>
      <description>&lt;p&gt;Designing and deploying agentic AI systems on Azure OpenAI is ultimately a software engineering problem, not just a prompt engineering exercise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Agentic AI on Azure OpenAI combines large language models with tools, memory, and orchestration so systems can perceive context, reason about goals, and act through APIs or workflows. In enterprise environments, these agents sit inside existing architectures, integrate with business systems like CRMs or ERPs, and must meet stringent requirements for reliability, security, observability, and governance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;At a high level, an Azure OpenAI agent in production is a composition of model, orchestration layer, enterprise services, and platform capabilities from Azure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typical Layers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Experience Layer&lt;/strong&gt;&lt;br&gt;
This includes chat widgets, web and mobile apps, IVR, or line-of-business front ends that capture user inputs and display responses. They communicate with a backend agent API over HTTPS and often stream partial responses for better perceived latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Orchestration and Agent Runtime&lt;/strong&gt;&lt;br&gt;
This is usually implemented as a microservice or set of services running on Azure Kubernetes Service, Azure Container Apps, or App Service. It handles dialogue state, calls Azure OpenAI for reasoning, invokes tools via function calling, manages retries, and applies business rules such as guardrails or approval workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Azure OpenAI Service&lt;/strong&gt;&lt;br&gt;
This provides deployed models such as GPT-4 class models, responses or chat APIs, function/tool calling, and system-level safety settings. You configure deployments per region and SKU, define capacity, and integrate them with your orchestration tier through the standard REST or Python/Java/.NET SDKs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Enterprise Tools and Data&lt;/strong&gt;&lt;br&gt;
Agents rely on tools that wrap internal systems: REST APIs, databases, search endpoints, and workflow engines. For retrieval augmented generation, you usually add Azure AI Search or vector indexes, while for workflow automation you integrate with Logic Apps, Power Automate, or internal microservices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Cross-Cutting Services&lt;/strong&gt;&lt;br&gt;
Governance, observability, and security come from services like Azure Monitor, Application Insights, Log Analytics, API Management, Key Vault, and Entra ID (Azure AD). These ensure authentication, authorization, quota management, rate limiting, metrics, tracing, and auditing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Components of an Azure OpenAI Agent
&lt;/h3&gt;

&lt;p&gt;An agent is more than a single prompt; it is usually composed of several cooperating elements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy and Role Definition&lt;/strong&gt;&lt;br&gt;
The agent's role defines its scope, allowed tools, and tone via system prompts and configuration. You specify what it may do, what data it may touch, and which escalation paths it must follow for sensitive actions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory and Context&lt;/strong&gt;&lt;br&gt;
Short-term memory is the conversation history and state for the current session, while long-term memory comes from knowledge bases and logs. On Azure this is often implemented with Azure AI Search, Cosmos DB, or SQL, combined with embeddings produced by Azure OpenAI models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tooling Interface&lt;/strong&gt;&lt;br&gt;
Functions are exposed to the model using Azure OpenAI function or tool calling: you define function schemas, arguments, and natural-language descriptions, then let the model choose when to call each tool. The orchestration layer executes the selected tool, captures the results, and feeds them back to the model as messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Safety, Guardrails, and Filters&lt;/strong&gt;&lt;br&gt;
You apply content filters, allow/deny lists, and input/output validation before and after every model call. For high-risk domains, human-in-the-loop review and approval are added as explicit steps in the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  How an Agent Behaves in Real Enterprise Scenarios
&lt;/h2&gt;

&lt;p&gt;In production, agent behavior is shaped by business rules, data access patterns, and organizational risk tolerance. Below are practical scenarios that show how this plays out with Azure OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support Automation&lt;/strong&gt;&lt;br&gt;
A customer opens a support chat in a portal. The frontend sends the message to an agent API that enriches it with user profile data and recent tickets from a CRM tool. The agent uses Azure AI Search to retrieve relevant knowledge articles and internal runbooks, then asks Azure OpenAI to draft a response via the responses or chat API with function calling. If the issue exceeds certain risk thresholds, the agent routes the conversation to a human agent, attaching a summarized context and proposed reply for faster handling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision Support&lt;/strong&gt;&lt;br&gt;
A portfolio manager asks, "How will this product change impact our quarterly margin?" The orchestration layer calls financial and sales data APIs to fetch current numbers, then passes structured summaries to the model through tools. The agent runs scenario analysis through multiple calls: one to generate assumptions, one to compute summaries over metrics, and one to explain trade-offs in business language. Outputs include narrative explanation plus structured justification, which can be stored for audit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workflow Automation&lt;/strong&gt;&lt;br&gt;
An internal user requests, "Create a change request for updating this microservice and notify the owners." The agent uses tools to create a work item in Azure DevOps or ServiceNow, update a change calendar, and send notifications via email or Teams connectors. It returns a summary with links, IDs, and the steps it performed, giving transparency into actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-End Agent Workflow Example
&lt;/h2&gt;

&lt;p&gt;Consider a support automation agent deployed on Azure OpenAI and fronted by a web chat in a corporate portal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: User Request and Intake&lt;/strong&gt;&lt;br&gt;
The user types: "My invoice shows the wrong amount, can you fix it?" The frontend passes this text, session identifiers, and user ID to a backend API along with any client-side telemetry such as locale and device type. Basic validation, rate limiting, and authentication via Entra ID occur at API Management or the gateway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Context Assembly&lt;/strong&gt;&lt;br&gt;
The agent service fetches user profile details and recent invoices through internal APIs exposed as tools. It queries Azure AI Search using an embeddings-based index over billing policies and knowledge articles, returning several relevant passages. The service then constructs a prompt for Azure OpenAI that includes system instructions, conversation history, retrieved documents, and structured invoice data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Reasoning and Tool Selection&lt;/strong&gt;&lt;br&gt;
Using the responses or chat API with function calling enabled, the model decides that it must call a "get_invoice_details" tool because the user is referencing a specific invoice. The orchestration layer executes that tool by calling the billing service, then posts the result back as a tool response, prompting the model again. The model now checks for mismatched line items and determines that a partial credit is appropriate per policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Action and Validation&lt;/strong&gt;&lt;br&gt;
The agent calls another tool, "create_credit_memo," but this time the orchestration code applies an extra guard: for credits above a certain amount, it requires human approval instead of automatic execution. The tool either executes or records the request in a queue for human review and returns the status to the agent. The orchestration layer logs all inputs, decisions, and tool outputs into Application Insights and Log Analytics for observability and audit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Response Generation and Streaming&lt;/strong&gt;&lt;br&gt;
The agent calls Azure OpenAI one more time with all updated context to generate a user-friendly explanation of what was done and what the user should expect next. Streaming is enabled so the frontend can display tokens as they arrive, which significantly improves perceived latency even if the overall response generation takes a few seconds. The final message is persisted to a conversation store along with structured metadata such as outcome status and tags for analytics.&lt;/p&gt;

&lt;p&gt;This pattern repeats across messages, giving the agent a dialog loop where each turn includes intake, context building, reasoning, tool use, and output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Approach on Azure OpenAI
&lt;/h2&gt;

&lt;p&gt;A robust agent implementation emerges from a staged approach that moves from problem definition to production hardening.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Defining the Use Case&lt;/strong&gt;&lt;br&gt;
Start with one or two focused journeys where agents can deliver measurable value, for example first-line support or internal request automation. Define clear success metrics such as deflection rate, handle time reduction, or user satisfaction, and translate them into model-level KPIs like answer accuracy or escalation rates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designing Agent Workflows&lt;/strong&gt;&lt;br&gt;
Map the current process step by step, then identify which decisions can move to the agent and which must remain with humans. Translate this into an orchestration design that uses patterns such as sequential flows, concurrent calls, or handoff flows. For complex environments, adopt a multi-agent design where specialized agents handle retrieval, planning, or domain-specific tasks, coordinated by a higher-level controller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt and Policy Engineering&lt;/strong&gt;&lt;br&gt;
Author precise system messages that describe role, boundaries, and tone, and include examples of desired behavior and red lines. Use few-shot examples for tricky reasoning steps, and add structured instructions that explain how to decide whether a tool is required. Encode non-negotiable business rules outside the prompt in actual code, so the agent can propose actions but cannot bypass compliance logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Integration&lt;/strong&gt;&lt;br&gt;
Wrap each enterprise system in a well-typed function definition with clear names and human-readable descriptions that help the model choose correctly. Keep tool schemas small; large or rarely used tools can be loaded conditionally via a higher-level tool search step to keep the active tool set manageable. Implement timeouts, retries with backoff, and circuit breakers per tool to avoid cascading failures when downstream systems are slow or unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment and Operations&lt;/strong&gt;&lt;br&gt;
Deploy the orchestration runtime to Azure Kubernetes Service or Azure Container Apps with proper horizontal scaling policies tied to CPU, memory, or QPS. Expose APIs through Azure API Management to control access, apply request throttling, and centralize authentication with Entra ID. Configure Azure Monitor, Application Insights, and Log Analytics for metrics, traces, and logs that capture every agent call, tool invocation, and error. For secrets and configuration such as API keys and connection strings, rely on Azure Key Vault and managed identities rather than environment variables or embedded secrets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Challenges and How to Handle Them
&lt;/h2&gt;

&lt;p&gt;Putting agents into production surfaces a set of recurring engineering challenges that go beyond prompt tuning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliability&lt;/strong&gt;&lt;br&gt;
API failures, timeouts, and model-side rate limits are common when systems operate at scale. You address this by using exponential backoff retries, circuit breakers, graceful degradation strategies, and careful quota management through Azure resource planning and API Management. For critical actions, implement idempotent operations and compensating transactions so repeated tool calls do not corrupt state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;br&gt;
The main contributors to latency are network overhead, tool call cascades, and token generation within the model. Effective strategies include response streaming, reducing prompt and response length, batching where possible, and parallelizing independent tool calls. Model choice also matters: using smaller or more efficient deployments where appropriate can significantly improve latency and throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Management&lt;/strong&gt;&lt;br&gt;
Cost scales with total tokens and call volume, especially in multi-call agent workflows. You can control cost by pruning unnecessary context, compressing history into summaries, capping max tokens, and routing low-value traffic to cheaper models. Monitoring per-feature and per-tenant consumption and applying quotas ensures no single consumer overwhelms the budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging and Observability&lt;/strong&gt;&lt;br&gt;
Debugging agents is difficult because behavior emerges from prompts, model weights, tools, and data working together. Rich logging of prompts, tool calls, and outputs, combined with correlation IDs across services, makes it possible to replay problem sessions and iteratively refine prompts and workflows. Telemetry dashboards that track hallucination reports, escalation rates, tool error rates, and user feedback are essential to continuous improvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;br&gt;
Scaling requires both the model side and the orchestration side to handle higher load with predictable performance. On the model side, that means provisioning sufficient capacity, using multiple deployments, and sometimes applying multi-region strategies for resilience. On the application side, it means stateless or externally stateful services, asynchronous processing for long-running actions, and autoscaling policies that respond to traffic patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance and Security&lt;/strong&gt;&lt;br&gt;
Enterprises need strong control over who can invoke agents, what data they can access, and how their actions are audited. Azure provides a foundation through Entra ID for identity, RBAC for resource access, private networking, and customer-managed keys for encryption at rest. You augment this with fine-grained policy at the application level, including role-based access to tools, PII redaction, data minimization, and retention controls. For regulated workloads, systematic logging and human-in-the-loop review for high-risk tasks provide additional assurance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Agentic AI on Azure OpenAI is most successful when treated as an engineered system that combines models, tools, data, and governance rather than a single intelligent component. By starting with clear use cases, designing explicit workflows, investing in observability and guardrails, and using Azure's platform capabilities for scaling and security, organizations can deploy agents that deliver meaningful automation and decision support while staying within enterprise risk boundaries.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>openai</category>
    </item>
  </channel>
</rss>
