<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Suraj Khaitan</title>
    <description>The latest articles on DEV Community by Suraj Khaitan (@suraj_khaitan_f893c243958).</description>
    <link>https://dev.to/suraj_khaitan_f893c243958</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2130149%2Fe5132e15-d188-49bb-986e-43d967f20723.jpg</url>
      <title>DEV Community: Suraj Khaitan</title>
      <link>https://dev.to/suraj_khaitan_f893c243958</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/suraj_khaitan_f893c243958"/>
    <language>en</language>
    <item>
      <title>🤖 We Gave an AI Agent Our Design System and Let It Build Our Frontend — Here's What Happened</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sat, 04 Apr 2026 14:41:02 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/we-gave-an-ai-agent-our-design-system-and-let-it-build-our-frontend-heres-what-happened-2hde</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/we-gave-an-ai-agent-our-design-system-and-let-it-build-our-frontend-heres-what-happened-2hde</guid>
      <description>&lt;p&gt;&lt;em&gt;How a custom GitHub Copilot agent with strict architectural guardrails turned feature delivery from days into hours on a multi-tenant enterprise platform&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Talks About in Enterprise Frontend
&lt;/h2&gt;

&lt;p&gt;Enterprise frontend development is slow. Not because developers can't write React components — they can — but because &lt;strong&gt;90% of the work isn't writing code. It's alignment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Which design tokens do I use? Where does this component go? How do I wire the API? What's the naming convention for hooks? Which state manager handles this? How do I handle dark mode? Did I forget the MSW handler for tests?&lt;/p&gt;

&lt;p&gt;On our team building an &lt;strong&gt;enterprise multi-tenant GenAI platform&lt;/strong&gt; — managing agents, tools, and knowledge bases across a large manufacturing conglomerate — the friction was even worse. We have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;custom corporate design system&lt;/strong&gt; with 360+ Tailwind tokens (no generic &lt;code&gt;gray-500&lt;/code&gt; allowed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 feature modules&lt;/strong&gt; with strict feature-first architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAPI codegen&lt;/strong&gt; that generates TypeScript types from a FastAPI backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MSW (Mock Service Worker)&lt;/strong&gt; for development and testing&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;7-tier RBAC system&lt;/strong&gt; with route-level access guards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Light/dark mode&lt;/strong&gt; using class-based Tailwind (&lt;code&gt;dark:&lt;/code&gt; variants on everything)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;i18n&lt;/strong&gt; for English and German&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every new component is a decision tree. Every junior developer ramp-up takes weeks. Every code review catches the same "you used &lt;code&gt;bg-white&lt;/code&gt; instead of &lt;code&gt;bg-background-base&lt;/code&gt;" mistake.&lt;/p&gt;

&lt;p&gt;So we did something different: &lt;strong&gt;we encoded our entire frontend architecture into an AI agent and let it build features for us.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR (If You Skim, Skim This)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; Enterprise frontend velocity bottlenecked by architectural complexity, design system compliance, and cross-cutting concerns (auth, theming, mocking, i18n).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move:&lt;/strong&gt; Built a custom VS Code agent (&lt;code&gt;.github/agents/FrontendAgent.agent.md&lt;/code&gt;) that knows our design system, file structure, state management strategy, and API codegen pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Feature scaffolding that used to take a day now takes minutes. The agent produces design-system-compliant, dark-mode-ready, MSW-wired, type-safe code on the first pass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tradeoff:&lt;/strong&gt; You need to invest upfront in writing precise agent instructions. Vague prompts produce vague code — garbage in, garbage out.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Not Just Use Copilot Out of the Box?
&lt;/h2&gt;

&lt;p&gt;We did. Here's what vanilla Copilot (without custom instructions) gave us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ What generic Copilot produced&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"bg-white dark:bg-gray-900 p-4 rounded-lg shadow-md"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"text-gray-900 dark:text-white text-xl font-bold"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    Tenants
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every single token is wrong. &lt;code&gt;bg-white&lt;/code&gt; should be &lt;code&gt;bg-background-base&lt;/code&gt;. &lt;code&gt;text-gray-900&lt;/code&gt; should be &lt;code&gt;text-text-normal&lt;/code&gt;. &lt;code&gt;p-4&lt;/code&gt; should be &lt;code&gt;p-400&lt;/code&gt;. &lt;code&gt;rounded-lg&lt;/code&gt; should be &lt;code&gt;rounded-m&lt;/code&gt;. &lt;code&gt;font-bold&lt;/code&gt; should be &lt;code&gt;font-bold font-primary&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Multiply that across 18 shared components, 8 feature modules, and hundreds of sub-components, and you're spending more time fixing AI output than you saved generating it.&lt;/p&gt;

&lt;p&gt;The realization: &lt;strong&gt;an AI assistant is only as good as its context.&lt;/strong&gt; Generic Copilot doesn't know your design system. It doesn't know your file conventions. It doesn't know that you use TanStack Query with a 5-minute stale time and 2 retries, not SWR or Redux Toolkit Query.&lt;/p&gt;

&lt;p&gt;So we gave it all of that context. Explicitly. In a single agent definition file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: A 200-Line Agent That Knows Everything
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot supports custom agents via markdown files in &lt;code&gt;.github/agents/&lt;/code&gt;. Ours lives at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.github/agents/FrontendAgent.agent.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's a single file that encodes every architectural decision our team has made. Think of it as a &lt;strong&gt;machine-readable engineering handbook&lt;/strong&gt; — the same document that would take a new hire two weeks to internalize, distilled into structured instructions an AI can execute against.&lt;/p&gt;

&lt;p&gt;Here's how we structured it:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Design System as Code (Not Suggestions)
&lt;/h3&gt;

&lt;p&gt;We don't tell the agent "try to use our design tokens." We tell it these are the &lt;strong&gt;only&lt;/strong&gt; tokens that exist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;DESIGN SYSTEM &amp;amp; THEMING (MANDATORY)
&lt;span class="p"&gt;-&lt;/span&gt; Use corporate design tokens only (NO generic Tailwind colors like gray-500/blue-600).
&lt;span class="p"&gt;-&lt;/span&gt; Always include dark mode variants (class-based: darkMode: 'class').
&lt;span class="p"&gt;-&lt;/span&gt; Semantic tokens examples:
&lt;span class="p"&gt;  -&lt;/span&gt; Colors: bg-background-base, bg-background-surface, text-text-normal,
            border-line-weak, bg-action, bg-status-error
&lt;span class="p"&gt;  -&lt;/span&gt; Spacing: p-400 (16px), m-600 (24px), gap-300 (12px)
&lt;span class="p"&gt;  -&lt;/span&gt; Typography: text-400, font-primary, font-secondary, font-bold
&lt;span class="p"&gt;  -&lt;/span&gt; Borders: rounded-m, border-s
&lt;span class="p"&gt;  -&lt;/span&gt; Transitions: duration-medium-1, ease-in-out
&lt;span class="p"&gt;-&lt;/span&gt; Reference: src/frontend/THEME_GUIDE.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The word "MANDATORY" isn't decoration. The agent treats sections labeled as mandatory as hard constraints, not preferences. When it generates a card component now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ What the custom agent produces&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="err"&gt;"&lt;/span&gt;&lt;span class="na"&gt;bg-background-surface&lt;/span&gt; &lt;span class="na"&gt;dark&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="na"&gt;bg-dark-background-surface&lt;/span&gt;
                &lt;span class="na"&gt;p-400&lt;/span&gt; &lt;span class="na"&gt;rounded-m&lt;/span&gt; &lt;span class="na"&gt;shadow-card&lt;/span&gt;
                &lt;span class="na"&gt;border&lt;/span&gt; &lt;span class="na"&gt;border-line-weak&lt;/span&gt; &lt;span class="na"&gt;dark&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="na"&gt;border-dark-line-weak&lt;/span&gt;
                &lt;span class="na"&gt;transition-all&lt;/span&gt; &lt;span class="na"&gt;duration-medium-1&lt;/span&gt; &lt;span class="na"&gt;ease-in-out&lt;/span&gt;&lt;span class="err"&gt;"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="err"&gt;"&lt;/span&gt;&lt;span class="na"&gt;text-text-normal&lt;/span&gt; &lt;span class="na"&gt;dark&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="na"&gt;text-dark-text-normal&lt;/span&gt;
                 &lt;span class="na"&gt;text-400&lt;/span&gt; &lt;span class="na"&gt;font-primary&lt;/span&gt; &lt;span class="na"&gt;font-bold&lt;/span&gt;&lt;span class="err"&gt;"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    Tenants
  &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every token is from our design system. Dark mode is included. Transitions use our timing tokens. No manual corrections needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Feature-First File Structure (Encoded, Not Implied)
&lt;/h3&gt;

&lt;p&gt;We explicitly map the file tree so the agent places files correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;FRONTEND ARCHITECTURE &amp;amp; CONVENTIONS
&lt;span class="p"&gt;-&lt;/span&gt; Feature-first organization:
  src/frontend/src/
    features/{feature}/
      api/          // Axios client functions
      components/   // UI components
      hooks/        // Feature hooks
      pages/        // Route-level pages
    components/     // Shared components
    contexts/       // Auth, Theme, Tenant contexts
    lib/            // Utilities
&lt;span class="p"&gt;-&lt;/span&gt; Import alias: @/ → src/
&lt;span class="p"&gt;-&lt;/span&gt; Naming: Components = PascalCase, Hooks = camelCase with 'use',
          API files = {feature}Api.ts, Contexts = {Name}Context.tsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we ask the agent to build a "knowledge base management feature," it doesn't create a flat &lt;code&gt;KnowledgeBase.tsx&lt;/code&gt; in the root. It scaffolds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/features/knowledgebase/
├── api/
│   └── knowledgebaseApi.ts
├── components/
│   ├── KnowledgeBaseList.tsx
│   └── CreateKnowledgeBaseDialog.tsx
├── hooks/
│   └── useKnowledgeBases.ts
├── pages/
│   └── KnowledgeBasePage.tsx
└── types/
    └── index.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correct directory. Correct naming. Correct separation of concerns. Every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. State Management: Pick the Right Tool Automatically
&lt;/h3&gt;

&lt;p&gt;We encode our state management decision tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;STATE &amp;amp; DATA
&lt;span class="p"&gt;-&lt;/span&gt; Server state: TanStack Query (staleTime 5 min, retries: 2)
&lt;span class="p"&gt;-&lt;/span&gt; Global auth: UserInfoProvider (contexts/AuthContext.tsx)
&lt;span class="p"&gt;-&lt;/span&gt; Theme: ThemeProvider
&lt;span class="p"&gt;-&lt;/span&gt; Local state: useState/useReducer (NO Redux/Zustand)
&lt;span class="p"&gt;-&lt;/span&gt; Error handling:
&lt;span class="p"&gt;  -&lt;/span&gt; Wrap TanStack Query errors with Sonner toasts
&lt;span class="p"&gt;  -&lt;/span&gt; ErrorBoundary component with design tokens
&lt;span class="p"&gt;  -&lt;/span&gt; "Access Lost" interceptor: clear tenant, redirect, show toast
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when the agent generates a data-fetching hook, it doesn't reach for &lt;code&gt;useEffect&lt;/code&gt; + &lt;code&gt;fetch&lt;/code&gt; or SWR. It produces exactly what our codebase expects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;useKnowledgeBases&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useAuth&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;useQuery&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;KnowledgeBase&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;queryKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;knowledgebases&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;queryFn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;knowledgebaseApi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getKnowledgeBases&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session-aware. Query-key namespaced. Auth-gated with &lt;code&gt;enabled&lt;/code&gt;. Retry count matching our standard. This is exactly what our human-written hooks look like — because the agent learned from the same conventions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Secret Weapon: MSW-First Development
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. Our agent doesn't just generate UI components — it generates the &lt;strong&gt;entire mock layer&lt;/strong&gt; alongside them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;MSW-FIRST DEVELOPMENT
&lt;span class="p"&gt;-&lt;/span&gt; Use MSW (Mock Service Worker) during UI work—dev server and tests.
&lt;span class="p"&gt;-&lt;/span&gt; Location: src/frontend/src/mocks/
&lt;span class="p"&gt;-&lt;/span&gt; Handlers:
&lt;span class="p"&gt;  -&lt;/span&gt; Realistic delays: 300–800ms
&lt;span class="p"&gt;  -&lt;/span&gt; Simulate ~5% errors
&lt;span class="p"&gt;  -&lt;/span&gt; Validate required fields and return error shapes consistent with backend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we ask the agent to build a new feature, the output includes MSW handlers with realistic data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Generated MSW handler for knowledge bases&lt;/span&gt;
&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/knowledgebases&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Simulate realistic network delay&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 5% error rate simulation&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Internal server error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;HttpResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kb-001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Production Manual - North Plant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;S3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ACTIVE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;documentCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1247&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;lastSynced&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2026-04-03T14:30:00Z&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="c1"&gt;// ... more realistic domain-contextualized data&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means the agent produces &lt;strong&gt;runnable features from the first prompt&lt;/strong&gt;. No waiting for the backend team. No dummy &lt;code&gt;setTimeout&lt;/code&gt; hacks. The UI renders with realistic data, realistic latency, and realistic error states immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Backend as Source of Truth: The Codegen Bridge
&lt;/h2&gt;

&lt;p&gt;One of our strongest architectural decisions was making the agent aware of our OpenAPI codegen pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;BACKEND AS SOURCE OF TRUTH (SPEC SYNC)
&lt;span class="p"&gt;-&lt;/span&gt; Backend is authoritative. FastAPI + Pydantic (code-first).
&lt;span class="p"&gt;-&lt;/span&gt; Frontend must use generated TypeScript types and API client only.
&lt;span class="p"&gt;-&lt;/span&gt; Codegen: pnpm api:codegen
&lt;span class="p"&gt;-&lt;/span&gt; After codegen, run git diff:
&lt;span class="p"&gt;  -&lt;/span&gt; If there is a diff, surface: "Frontend types are stale relative
    to backend OpenAPI" and include diff summary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our codegen setup (&lt;code&gt;openapi-ts.config.ts&lt;/code&gt;) generates types, SDK methods, and even TanStack Query hooks directly from the backend's OpenAPI spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// openapi-ts.config.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@hey-api/openapi-ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@hey-api/client-fetch&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:8000/openapi.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;src/client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;prettier&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@tanstack/react-query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;queryOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;mutationOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@hey-api/typescript&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;enums&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;javascript&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the agent starts a task, it checks whether the generated types are current. If they've drifted, it flags it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;⚠️ SPEC MISMATCH: Frontend types are stale.
  - Missing field: `retryCount` on PromotionEvent
  - New enum value: `ROLLED_BACK` in PromotionStatus
  Running `pnpm api:codegen` to sync...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents the classic "the UI expects a field the API doesn't send" bug that usually surfaces at 11 PM on a Friday.&lt;/p&gt;




&lt;h2&gt;
  
  
  Autonomy Levels: Controlling the Blast Radius
&lt;/h2&gt;

&lt;p&gt;We don't always want the agent to write production code. Sometimes we want a plan. Sometimes a scaffold. Sometimes the full implementation.&lt;/p&gt;

&lt;p&gt;So we built three autonomy levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;AUTONOMY LEVELS (Default = Level 2)
&lt;span class="p"&gt;-&lt;/span&gt; Level 1: Plan Only → Step-by-step plan, file paths, component
            signatures. No code changes.
&lt;span class="p"&gt;-&lt;/span&gt; Level 2: Plan + Scaffold → Create files, stubs, routing/context
            wiring, MSW handlers. Minimal UI with tokens; TODO comments.
&lt;span class="p"&gt;-&lt;/span&gt; Level 3: Full Implementation → Complete feature including styling,
            tests, mocks, docs, and ready-to-run commands.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Level 1&lt;/strong&gt; is for architecture discussions. "How would you build a promotion approval workflow?" The agent produces a plan, lists affected files, and maps component relationships — without touching a single file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2&lt;/strong&gt; (the default) is our workhorse. The agent creates the file structure, wires routes and contexts, sets up MSW handlers, and builds minimal UI with correct tokens. Developers fill in the business logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 3&lt;/strong&gt; is for well-defined features with clear specs. The agent produces everything: components, hooks, API functions, MSW handlers, unit tests, and even the &lt;code&gt;pnpm&lt;/code&gt; commands to verify the output.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Agent Lifecycle: Not Just "Generate Code"
&lt;/h2&gt;

&lt;p&gt;What separates this from a glorified code generator is the &lt;strong&gt;end-to-end lifecycle&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;END-TO-END AGENT LIFECYCLE
Phase A — Plan
&lt;span class="p"&gt;-&lt;/span&gt; Outline goals, dependencies, spec sync (codegen), and scope.
&lt;span class="p"&gt;-&lt;/span&gt; Note any backend spec gaps (SPEC MISMATCH section).

Phase B — Implement
&lt;span class="p"&gt;-&lt;/span&gt; Apply scaffolding/implementation per autonomy level.
&lt;span class="p"&gt;-&lt;/span&gt; Add MSW handlers and tests.

Phase C — Validate
&lt;span class="p"&gt;-&lt;/span&gt; Run typecheck, build, tests; verify codegen freshness.

Phase D — Deliver
&lt;span class="p"&gt;-&lt;/span&gt; Provide diffs, test plan, run commands, and follow-up concerns.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn't just output code and walk away. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Plans&lt;/strong&gt; — analyzing the request against the existing codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Syncs&lt;/strong&gt; — running codegen to ensure types are fresh&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implements&lt;/strong&gt; — generating code compliant with every convention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validates&lt;/strong&gt; — running &lt;code&gt;pnpm frontend:quality&lt;/code&gt; (typecheck + lint + format)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delivers&lt;/strong&gt; — providing exact commands to test its output&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That validation step is key. If the agent generates code with a type error, it catches it in the same session and fixes it. The developer receives working code, not a first draft.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Output: What It Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;Here's a real interaction. We asked the agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Build a deployment management page for the tenant feature. It should show a table of deployments with status badges, and a dialog to trigger new deployments."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The agent produced:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8 files created:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/features/tenants/pages/DeploymentsPage.tsx
src/features/tenants/components/DeploymentTable.tsx
src/features/tenants/components/DeployAgentDialog.tsx
src/features/tenants/hooks/useDeployments.ts
src/features/tenants/types/deployment.ts
src/mocks/handlers/deployments.ts
src/features/tenants/components/__tests__/DeploymentTable.test.tsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Every file followed conventions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Design tokens, not raw Tailwind&lt;/li&gt;
&lt;li&gt;Dark mode variants on every element&lt;/li&gt;
&lt;li&gt;TanStack Query with proper query keys&lt;/li&gt;
&lt;li&gt;MSW handlers with realistic delays and 5% error simulation&lt;/li&gt;
&lt;li&gt;Radix Dialog for the deployment trigger&lt;/li&gt;
&lt;li&gt;Sonner toasts for success/error feedback&lt;/li&gt;
&lt;li&gt;Route guard with &lt;code&gt;RequireDeveloperAccess&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Zero manual corrections&lt;/strong&gt; to the design system usage. One adjustment to a business logic edge case (handling a deployment state we hadn't documented). Total time from prompt to PR-ready code: &lt;strong&gt;~20 minutes&lt;/strong&gt; including review. Previous estimate for the same feature: &lt;strong&gt;1–2 days&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pitfalls (A.K.A. What Bit Us So It Doesn't Bite You)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Vague Instructions = Vague Code
&lt;/h3&gt;

&lt;p&gt;Our first agent definition was 40 lines. It produced code that was "close but not quite." The spacing tokens were right but the color tokens were generic. The file structure was feature-first but the naming was inconsistent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; We expanded to 200+ lines with explicit examples, explicit anti-patterns ("NO generic Tailwind"), and references to real files in the repo. The more specific your instructions, the more accurate the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Agent Doesn't Know What Changed Yesterday
&lt;/h3&gt;

&lt;p&gt;If you add a new design token or change a convention and don't update the agent file, it'll use the old pattern. The agent definition is a living document — it needs to be maintained alongside the codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; We added agent definition updates to our PR checklist. Changed a convention? Update &lt;code&gt;FrontendAgent.agent.md&lt;/code&gt; in the same PR.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. MSW Handlers Can Drift from Reality
&lt;/h3&gt;

&lt;p&gt;The agent generates mock handlers based on its understanding of the API. But if the real API has quirks (pagination cursors, non-standard error shapes, optional fields), the mocks might not match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; We added the &lt;code&gt;SPEC MISMATCH&lt;/code&gt; protocol. The agent explicitly flags when it's making assumptions about the API, so developers know which mocks need validation against the real backend.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Over-Reliance Kills Understanding
&lt;/h3&gt;

&lt;p&gt;The fastest way to create a team that doesn't understand its own codebase is to let the agent write everything without review. We use the agent as a &lt;strong&gt;force multiplier&lt;/strong&gt;, not a replacement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; We default to Level 2 (scaffold), not Level 3 (full implementation). Developers fill in remaining business logic, which ensures they understand the code they're shipping.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Token Stuffing — There's a Context Window Limit
&lt;/h3&gt;

&lt;p&gt;Our agent instructions are 200+ lines, the theme guide is another 300+, and the copilot instructions are 150+. Some LLMs struggle with this much context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; We keep the agent file focused on &lt;strong&gt;rules and patterns&lt;/strong&gt;, not exhaustive token lists. The agent references &lt;code&gt;THEME_GUIDE.md&lt;/code&gt; for the full token catalogue rather than embedding it inline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Before the custom agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Feature scaffolding:&lt;/strong&gt; 4–8 hours (file creation, routing, context wiring, mock setup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design system violations per PR:&lt;/strong&gt; 3–5 (wrong tokens, missing dark mode)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time to first rendered component:&lt;/strong&gt; 2–4 hours (waiting for mock data setup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New developer ramp-up:&lt;/strong&gt; 2–3 weeks to internalize conventions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After the custom agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Feature scaffolding:&lt;/strong&gt; 15–30 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design system violations per PR:&lt;/strong&gt; 0–1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time to first rendered component:&lt;/strong&gt; Under 10 minutes (MSW handlers generated alongside UI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New developer ramp-up:&lt;/strong&gt; Days — they read the agent file and see the patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The scaffolding speedup alone is &lt;strong&gt;10–15x&lt;/strong&gt;. But the real win is &lt;strong&gt;consistency&lt;/strong&gt;. Every feature looks like every other feature. Every hook follows the same pattern. Every mock handler has the same structure. The codebase feels like it was written by one very disciplined developer, not a rotating team of six.&lt;/p&gt;




&lt;h2&gt;
  
  
  When You Should &lt;em&gt;Not&lt;/em&gt; Use This Pattern
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Greenfield prototypes&lt;/strong&gt; — if you're still deciding on conventions, you don't have enough patterns to encode. The agent amplifies consistency; it can't create it from nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small teams with one frontend developer&lt;/strong&gt; — if one person owns the entire frontend, the conventions live in their head. The agent adds overhead without proportional benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequently changing architecture&lt;/strong&gt; — if you're rewriting your state management strategy every sprint, the agent definition will always be stale. Stabilize first, then encode.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Practical Implementation Checklist
&lt;/h2&gt;

&lt;p&gt;If you want to build your own frontend agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Document your design system&lt;/strong&gt; in a machine-readable format (we use a Tailwind config + theme guide)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Map your file structure&lt;/strong&gt; explicitly — feature directories, naming conventions, import aliases&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Encode your state management rules&lt;/strong&gt; — which tool for which type of state, and why&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Define your API integration pattern&lt;/strong&gt; — codegen pipeline, client library, error handling&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Include anti-patterns&lt;/strong&gt; — what NOT to do is as important as what to do&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Add autonomy levels&lt;/strong&gt; — give developers control over how much the agent does&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Wire in validation&lt;/strong&gt; — the agent should run your lint/typecheck/build as part of its output&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Reference, don't embed&lt;/strong&gt; — point to config files rather than duplicating 360 lines of tokens&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Add a lifecycle&lt;/strong&gt; — plan, implement, validate, deliver — not just "generate code"&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Maintain it like code&lt;/strong&gt; — update the agent file in the same PR as convention changes&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Start with scaffold mode&lt;/strong&gt; — let developers fill in business logic to maintain understanding&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Include MSW patterns&lt;/strong&gt; — mock-first development is essential for frontend agent velocity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Deeper Insight: Agents Are Architecture Documentation That Executes
&lt;/h2&gt;

&lt;p&gt;The most unexpected benefit wasn't speed. It was &lt;strong&gt;documentation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Our &lt;code&gt;FrontendAgent.agent.md&lt;/code&gt; file is the most accurate, most up-to-date description of our frontend architecture. Not because we wrote documentation — we hate writing documentation — but because &lt;strong&gt;if the agent file is wrong, the generated code is wrong, and someone fixes the agent file.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's documentation with a built-in feedback loop. When the agent produces a component with the wrong token, the developer who catches it updates the agent instructions. The next generation is correct. Over time, the agent file converges on a precise description of how the codebase actually works.&lt;/p&gt;

&lt;p&gt;Compare that to a Confluence page that was last updated eight months ago.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next: The Agent Becomes the PR Reviewer
&lt;/h2&gt;

&lt;p&gt;We're exploring using the same agent instructions as a &lt;strong&gt;code review agent&lt;/strong&gt;. If the agent knows every convention, it should be able to flag violations in PRs automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"This component uses &lt;code&gt;bg-gray-100&lt;/code&gt; — should be &lt;code&gt;bg-background-surface&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;"This hook is in &lt;code&gt;src/components/&lt;/code&gt; — should be in &lt;code&gt;src/features/tenants/hooks/&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;"Missing dark mode variant on &lt;code&gt;text-text-normal&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;"MSW handler missing for new &lt;code&gt;/api/promotions/:id/approve&lt;/code&gt; endpoint"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same knowledge, different mode. Build in one direction, verify in the other.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing: The Best Frontend Engineer on Your Team Doesn't Sleep
&lt;/h2&gt;

&lt;p&gt;An AI agent with the right instructions isn't a replacement for your frontend team. It's the &lt;strong&gt;most consistent&lt;/strong&gt; member of your frontend team. It never forgets a dark mode variant. It never uses the wrong spacing token. It never puts a hook in the wrong directory.&lt;/p&gt;

&lt;p&gt;But it also doesn't make product decisions. It doesn't architect from scratch. It doesn't push back on a bad spec.&lt;/p&gt;

&lt;p&gt;The sweet spot is composing human judgment with machine consistency. You decide &lt;em&gt;what&lt;/em&gt; to build. The agent scaffolds &lt;em&gt;how&lt;/em&gt; — following every convention, every token, every pattern your team has established.&lt;/p&gt;

&lt;p&gt;And when it's 4 PM on a Friday and the PM says "we need one more feature page before the demo," you can spin up a complete, design-system-compliant, dark-mode-ready, MSW-wired, type-safe scaffold in 15 minutes instead of 4 hours.&lt;/p&gt;

&lt;p&gt;That's not magic. That's architecture, encoded.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;How are you using AI agents in your frontend workflow? Are you encoding project-specific knowledge, or using generic assistants? I'd love to hear what patterns are working for teams at scale — drop a comment.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Copilot: &lt;a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" rel="noopener noreferrer"&gt;Custom Instructions&lt;/a&gt; — how to add project-specific context&lt;/li&gt;
&lt;li&gt;MSW: &lt;a href="https://mswjs.io/" rel="noopener noreferrer"&gt;Mock Service Worker&lt;/a&gt; — API mocking for browser and Node.js&lt;/li&gt;
&lt;li&gt;Hey API: &lt;a href="https://heyapi.dev/" rel="noopener noreferrer"&gt;OpenAPI TypeScript Codegen&lt;/a&gt; — generate types and clients from OpenAPI specs&lt;/li&gt;
&lt;li&gt;TanStack Query: &lt;a href="https://tanstack.com/query/latest" rel="noopener noreferrer"&gt;React Query&lt;/a&gt; — server state management&lt;/li&gt;
&lt;li&gt;Tailwind CSS: &lt;a href="https://tailwindcss.com/docs/theme" rel="noopener noreferrer"&gt;Design Tokens&lt;/a&gt; — custom theme configuration&lt;/li&gt;
&lt;li&gt;Radix UI: &lt;a href="https://www.radix-ui.com/" rel="noopener noreferrer"&gt;Headless Primitives&lt;/a&gt; — accessible UI components without default styles&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Suraj Khaitan&lt;/strong&gt; — Gen AI Architect | Building scalable platforms and AI-augmented engineering workflows&lt;/p&gt;

&lt;p&gt;Connect on &lt;a href="https://www.linkedin.com/in/suraj-khaitan-501736a2/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | Follow for more engineering and architecture write-ups&lt;/p&gt;




</description>
      <category>ai</category>
      <category>agents</category>
      <category>aws</category>
      <category>frontend</category>
    </item>
    <item>
      <title>🚀 I Mass Terminated My Copilot Plans. Here's Why Claude Code Won.</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sat, 14 Mar 2026 10:32:45 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/-i-mass-terminated-my-copilot-plans-heres-why-claude-code-won-321a</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/-i-mass-terminated-my-copilot-plans-heres-why-claude-code-won-321a</guid>
      <description>&lt;p&gt;&lt;em&gt;How an agentic AI in the terminal replaced my IDE plugins, scaffold scripts, and half my Stack Overflow tabs—without ever opening a GUI&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Moment I Realized My Coding Workflow Was a Lie
&lt;/h2&gt;

&lt;p&gt;Every developer eventually hits the same wall:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I have 4 AI extensions, 12 keyboard shortcuts, and I'm still copy-pasting code between a chatbot and my editor."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Tab-complete autocomplete? Great for variable names. IDE chat panels? Nice for explaining regex. But the moment you need an AI to &lt;strong&gt;actually understand your codebase, edit 14 files, run your tests, and fix its own mistakes&lt;/strong&gt;—the shiny plugins fall apart.&lt;/p&gt;

&lt;p&gt;Then I tried something that felt reckless:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I gave an AI full access to my terminal.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Specifically: &lt;strong&gt;Claude Code—Anthropic's agentic coding tool that lives in your CLI, reads your repo, writes real code, and executes commands.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I haven't looked back.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR (If You Only Read One Section)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; AI coding assistants that autocomplete lines can't architect solutions. Chat-based tools require endless copy-paste.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move:&lt;/strong&gt; Claude Code operates as an agentic AI &lt;em&gt;inside your terminal&lt;/em&gt;—it reads, writes, runs, and iterates autonomously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Multi-file refactors in minutes. Bug fixes with zero context switching. Git workflows handled conversationally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tradeoff:&lt;/strong&gt; You're trusting an agent with shell access. Guardrails and review discipline matter more than ever.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Claude Code Is Trending Right Now
&lt;/h2&gt;

&lt;p&gt;Scroll through any dev community in 2025–2026, and you'll see the same frustration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Copilot autocomplete is nice but it doesn't &lt;em&gt;think&lt;/em&gt;."&lt;/li&gt;
&lt;li&gt;"ChatGPT is smart but it doesn't know my codebase."&lt;/li&gt;
&lt;li&gt;"I spend more time prompt-engineering than actual engineering."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code hits different because it collapses the gap between &lt;strong&gt;knowing&lt;/strong&gt; and &lt;strong&gt;doing&lt;/strong&gt;. It doesn't suggest code in a sidebar—it &lt;em&gt;implements changes directly in your repo&lt;/em&gt;, runs your test suite, reads the errors, and fixes them. In a loop. Without you alt-tabbing once.&lt;/p&gt;

&lt;p&gt;The industry term is &lt;strong&gt;agentic coding&lt;/strong&gt;. And it's not a buzzword anymore—it's a workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Even &lt;em&gt;Is&lt;/em&gt; Claude Code?
&lt;/h2&gt;

&lt;p&gt;Claude Code is a command-line tool from Anthropic. You install it, point it at a project, and talk to it like a senior developer sitting next to you.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install it&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @anthropic-ai/claude-code

&lt;span class="c"&gt;# Start it in your project&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;my-project
claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No VS Code extension to configure. No API keys to paste into settings.json. No "select model" dropdown with 47 options.&lt;/p&gt;

&lt;p&gt;You get a REPL-like interface where you type natural language, and Claude:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reads&lt;/strong&gt; your files and project structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plans&lt;/strong&gt; the changes needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Writes&lt;/strong&gt; the code across multiple files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs&lt;/strong&gt; commands (tests, builds, linters)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterates&lt;/strong&gt; if something breaks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's like pair programming—except your pair never gets tired, never forgets the module structure, and never says "let me think about that" for 45 minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Workflows That Made Me a Believer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) The "Refactor 30 Files" Moment
&lt;/h3&gt;

&lt;p&gt;I needed to migrate an API layer from Axios to a custom fetch wrapper. With traditional AI tools, that's:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explain the pattern in a chat&lt;/li&gt;
&lt;li&gt;Copy the suggestion&lt;/li&gt;
&lt;li&gt;Paste it into File 1&lt;/li&gt;
&lt;li&gt;Realize it doesn't match my error handling&lt;/li&gt;
&lt;li&gt;Re-explain&lt;/li&gt;
&lt;li&gt;Repeat 29 more times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Refactor all API calls in src/features/ from axios to use the 
  fetchWrapper in src/lib/api.ts. Preserve error handling patterns. 
  Run the type checker after.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It read every file, understood the existing patterns, made the changes, ran &lt;code&gt;tsc&lt;/code&gt;, found 3 type errors, and fixed them. Total time: &lt;strong&gt;4 minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2) The "Debug This Flaky Test" Nightmare
&lt;/h3&gt;

&lt;p&gt;A test was passing locally and failing in CI. The usual investigation: environment differences, timing issues, mock state leaking.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; The test in src/features/agents/__tests__/AgentList.test.tsx is 
  failing in CI with "Unable to find role='button'". It passes locally. 
  Investigate and fix.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code read the test, read the component, identified a race condition with an async render, added the correct &lt;code&gt;waitFor&lt;/code&gt; wrapper, and ran the test suite to confirm. &lt;strong&gt;Done in 90 seconds.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3) The "Write the Whole Feature" Sprint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Create a new feature module for "cost-management" under src/features/. 
  Follow the same pattern as the agents feature: api layer, components, 
  hooks, and route registration. Include a dashboard page with a summary 
  card grid and a data table.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It scaffolded 8 files, wired up the route, created TanStack Query hooks, and built components using our existing design tokens—because it &lt;strong&gt;read our codebase first&lt;/strong&gt;. Not a template. Not a snippet. Actual contextual code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Why "Terminal-Native" Is the Unlock
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools follow this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IDE Plugin → Language Server → AI API → Suggestion → Developer copies it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code follows this one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer → Claude Code (terminal) → reads repo → plans → writes files → runs commands → verifies → done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key difference: &lt;strong&gt;the feedback loop is closed&lt;/strong&gt;. Claude doesn't suggest and hope. It acts, observes the result, and iterates.&lt;/p&gt;

&lt;p&gt;This is the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A GPS that &lt;em&gt;shows you the route&lt;/em&gt; (traditional AI)&lt;/li&gt;
&lt;li&gt;A self-driving car that &lt;em&gt;takes you there&lt;/em&gt; (agentic AI)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why the Terminal?
&lt;/h3&gt;

&lt;p&gt;The terminal is the most powerful interface a developer has. It's where you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run builds and tests&lt;/li&gt;
&lt;li&gt;Manage git&lt;/li&gt;
&lt;li&gt;Execute scripts&lt;/li&gt;
&lt;li&gt;Install dependencies&lt;/li&gt;
&lt;li&gt;Deploy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By living in the terminal, Claude Code has access to the same tools you do. It doesn't need a special plugin API or language server protocol. It just… uses your tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Permission Model: Trust, but Verify
&lt;/h2&gt;

&lt;p&gt;Here's the part that makes security-conscious engineers twitch: this thing can run commands.&lt;/p&gt;

&lt;p&gt;Claude Code handles this with a &lt;strong&gt;tiered permission system&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Permission&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Read files&lt;/td&gt;
&lt;td&gt;✅ Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write/edit files&lt;/td&gt;
&lt;td&gt;⚠️ Asks permission (configurable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run terminal commands&lt;/td&gt;
&lt;td&gt;⚠️ Asks permission (configurable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run "safe" commands (ls, cat, grep)&lt;/td&gt;
&lt;td&gt;✅ Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run destructive commands&lt;/td&gt;
&lt;td&gt;🛑 Always asks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can configure it to auto-approve certain patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Allow all file writes in src/&lt;/span&gt;
&lt;span class="c"&gt;# Allow test runs without asking&lt;/span&gt;
&lt;span class="c"&gt;# Always ask before git push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mental model: &lt;strong&gt;it's a junior developer with terminal access&lt;/strong&gt;. You wouldn't let them &lt;code&gt;git push --force&lt;/code&gt; without review, but you'd let them run &lt;code&gt;npm test&lt;/code&gt; freely.&lt;/p&gt;




&lt;h2&gt;
  
  
  Claude Code vs. The Field: An Honest Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;GitHub Copilot&lt;/th&gt;
&lt;th&gt;ChatGPT/GPT-4&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Line-level autocomplete&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;td&gt;❌ N/A&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;❌ Not its thing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-file edits&lt;/td&gt;
&lt;td&gt;❌ Limited&lt;/td&gt;
&lt;td&gt;❌ Copy-paste&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codebase awareness&lt;/td&gt;
&lt;td&gt;⚠️ Current file&lt;/td&gt;
&lt;td&gt;❌ None&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runs commands&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;⚠️ Limited&lt;/td&gt;
&lt;td&gt;✅ Full terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-corrects errors&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;⚠️ Sometimes&lt;/td&gt;
&lt;td&gt;✅ Yes (loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works without IDE&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Yes (browser)&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Yes (terminal)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic workflow&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;⚠️ Emerging&lt;/td&gt;
&lt;td&gt;✅ Core design&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The nuance:&lt;/strong&gt; Claude Code isn't trying to replace your autocomplete. It's a different tool for a different job. Use Copilot for line-level flow. Use Claude Code when you need an agent that &lt;em&gt;does work&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Workflow That Actually Works
&lt;/h2&gt;

&lt;p&gt;After months of daily use, here's my optimized flow:&lt;/p&gt;

&lt;h3&gt;
  
  
  Morning: Strategic Work with Claude Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Review the open PR #142. Summarize the changes and flag 
  any potential issues with our auth middleware.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Implement the API integration for the new knowledge-base 
  management feature. Follow existing patterns in src/features/agents/.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Afternoon: Tactical Fixes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Fix all TypeScript errors in src/features/tools/. 
  Run the type checker and show me the results.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Update the unit tests for UseCaseApi to cover the new 
  delete endpoint. Run them and make sure they pass.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  End of Day: Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Review all changes I've made today. Create a commit with 
  a conventional commit message.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The shift: I went from &lt;strong&gt;writing code&lt;/strong&gt; to &lt;strong&gt;directing code&lt;/strong&gt;. My job became architecture, review, and decision-making. The implementation became a conversation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotchas (The Part Everyone Discovers at 2 AM)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) It's Confident, Not Always Correct
&lt;/h3&gt;

&lt;p&gt;Claude Code will make changes with conviction. Sometimes those changes are subtly wrong. &lt;strong&gt;Always review diffs before committing.&lt;/strong&gt; Trust the agent, but verify the output.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Context Window Limits Are Real
&lt;/h3&gt;

&lt;p&gt;On massive monorepos, Claude Code can't hold your entire codebase in memory. Mitigations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a &lt;code&gt;CLAUDE.md&lt;/code&gt; file to give it project context and conventions&lt;/li&gt;
&lt;li&gt;Point it at specific directories rather than the whole repo&lt;/li&gt;
&lt;li&gt;Break large tasks into focused steps&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3) It Can Get Into Loops
&lt;/h3&gt;

&lt;p&gt;Occasionally, it'll try to fix an error, introduce a new one, fix that, introduce another. When you see this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop it&lt;/li&gt;
&lt;li&gt;Give it clearer constraints&lt;/li&gt;
&lt;li&gt;Break the task down&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4) Cost Awareness
&lt;/h3&gt;

&lt;p&gt;Claude Code uses API credits. Complex multi-file refactors with test loops can add up. Monitor your usage, especially in the "let it run" agentic mode.&lt;/p&gt;




&lt;h2&gt;
  
  
  The CLAUDE.md File: Your Project's AI Constitution
&lt;/h2&gt;

&lt;p&gt;The secret weapon most people miss: create a &lt;code&gt;CLAUDE.md&lt;/code&gt; at your project root.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Project Overview&lt;/span&gt;
This is a React + FastAPI monorepo for an internal platform.

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use design system tokens, never raw Tailwind colors
&lt;span class="p"&gt;-&lt;/span&gt; Follow feature-based file organization under src/features/
&lt;span class="p"&gt;-&lt;/span&gt; Use TanStack Query for server state
&lt;span class="p"&gt;-&lt;/span&gt; All API calls go through src/lib/api.ts

&lt;span class="gu"&gt;## Commands&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pnpm frontend:dev`&lt;/span&gt; - Start frontend
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pnpm frontend:quality`&lt;/span&gt; - Type check + lint
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pytest`&lt;/span&gt; - Run backend tests

&lt;span class="gu"&gt;## Don'ts&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never modify shared components without discussing
&lt;span class="p"&gt;-&lt;/span&gt; Don't install new dependencies without justification
&lt;span class="p"&gt;-&lt;/span&gt; Don't push directly to main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file acts as persistent memory. Every time Claude Code starts, it reads this file and follows the rules. It's like onboarding documentation—but for your AI pair programmer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who Should (and Shouldn't) Use Claude Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use it if:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You work on codebases with 10+ files that need coordinated changes&lt;/li&gt;
&lt;li&gt;You're tired of copy-pasting between AI chats and your editor&lt;/li&gt;
&lt;li&gt;You want to automate repetitive refactors, test writing, or migrations&lt;/li&gt;
&lt;li&gt;You're comfortable reviewing diffs and understanding the code an AI writes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Skip it if:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You mainly need line-level autocomplete (use Copilot)&lt;/li&gt;
&lt;li&gt;You're learning to code and need to understand every line you write&lt;/li&gt;
&lt;li&gt;Your org prohibits AI tools from accessing source code&lt;/li&gt;
&lt;li&gt;You prefer GUI-first workflows and rarely use the terminal&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Bigger Picture: We're Entering the "Agent" Era of Dev Tools
&lt;/h2&gt;

&lt;p&gt;Claude Code isn't an anomaly. It's the leading edge of a shift:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Era 1&lt;/strong&gt; — Stack Overflow &amp;amp; Docs (search for answers)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Era 2&lt;/strong&gt; — AI Chat (ask for answers)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Era 3&lt;/strong&gt; — AI Autocomplete (get suggestions inline)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Era 4&lt;/strong&gt; — &lt;strong&gt;Agentic AI (delegate tasks to an autonomous agent)&lt;/strong&gt;  ← &lt;em&gt;We are here&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The developers who thrive in Era 4 won't be the fastest typists. They'll be the best &lt;strong&gt;directors&lt;/strong&gt;—people who can decompose problems, set constraints, review output, and guide an agent toward the right solution.&lt;/p&gt;

&lt;p&gt;The skill isn't "can you write a React component?" anymore.&lt;/p&gt;

&lt;p&gt;It's "can you describe what the component should do, review what the agent built, and course-correct in real time?"&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Take: It's Not About Replacing Developers
&lt;/h2&gt;

&lt;p&gt;Every AI tool gets the same question: "Will this replace me?"&lt;/p&gt;

&lt;p&gt;No. But it will replace the &lt;em&gt;version of you&lt;/em&gt; that spends 60% of the day on mechanical implementation.&lt;/p&gt;

&lt;p&gt;Claude Code doesn't have taste. It doesn't know your users. It can't decide whether a feature should exist. It can't navigate a product meeting, push back on a bad spec, or mentor a junior developer.&lt;/p&gt;

&lt;p&gt;But it can turn your architectural decisions into working code faster than any tool I've used. And that's not a threat—it's a superpower.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your biggest frustration with current AI coding tools? Is it context awareness, copy-paste fatigue, or something else? Drop your take below.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code — Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/claude" rel="noopener noreferrer"&gt;Anthropic — Claude Model Family&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code/memory" rel="noopener noreferrer"&gt;CLAUDE.md — Project Context Files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/research" rel="noopener noreferrer"&gt;Agentic Coding Explained — Anthropic Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/@anthropic-ai/claude-code" rel="noopener noreferrer"&gt;Getting Started: npm install -g @anthropic-ai/claude-code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Suraj Khaitan&lt;/strong&gt; — Gen AI Architect | Building scalable platforms and secure cloud-native systems&lt;/p&gt;

&lt;p&gt;Connect on &lt;a href="https://www.linkedin.com/in/suraj-khaitan-501736a2/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | Follow for more engineering and architecture write-ups&lt;/p&gt;




</description>
      <category>ai</category>
      <category>antigravity</category>
      <category>agents</category>
      <category>cloud</category>
    </item>
    <item>
      <title>🚀 Stop Calling STS on Every Request: Redis Caching Patterns That Cut Login Latency by 10x</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sat, 28 Feb 2026 07:39:04 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/stop-calling-sts-on-every-request-redis-caching-patterns-that-cut-login-latency-by-10x-1pnh</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/stop-calling-sts-on-every-request-redis-caching-patterns-that-cut-login-latency-by-10x-1pnh</guid>
      <description>&lt;p&gt;&lt;em&gt;How caching sessions and temporary AWS credentials in Redis turned our auth layer from a bottleneck into a near-zero-cost lookup&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Moment We Realized Our Auth Was a DDoS on Ourselves
&lt;/h2&gt;

&lt;p&gt;Every authenticated request in our multi-tenant platform did the same dance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Validate the user's session&lt;/li&gt;
&lt;li&gt;Check their role mappings (tenant, use case, environment)&lt;/li&gt;
&lt;li&gt;Call AWS STS to assume the right IAM role&lt;/li&gt;
&lt;li&gt;Return temporary credentials so downstream services could talk to S3, DynamoDB, Bedrock, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps 1–3 hit the network. Every. Single. Time.&lt;/p&gt;

&lt;p&gt;At modest traffic, it was fine. At scale, we were essentially DDoS-ing our own identity layer—STS throttling kicked in, latency spiked, and users saw login spinners that never stopped spinning.&lt;/p&gt;

&lt;p&gt;The fix wasn't a new auth framework. It was &lt;strong&gt;Redis&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR (If You Skim, Skim This)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; Per-request STS calls + stateless session validation = slow logins + rate limiting at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move:&lt;/strong&gt; Cache session data and STS credentials in Redis with structured keys and smart TTLs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Sub-millisecond session lookups, ~90% fewer STS API calls, and a warm credential cache that makes subsequent requests feel instant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tradeoff:&lt;/strong&gt; You need a cache invalidation strategy and must handle Redis failures gracefully.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Pattern Is Having a Moment
&lt;/h2&gt;

&lt;p&gt;Three trends are colliding right now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tenant platforms are everywhere.&lt;/strong&gt; Each tenant has its own IAM boundary, its own roles, its own credential scope. That's a lot of &lt;code&gt;AssumeRole&lt;/code&gt; calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;STS has hard rate limits.&lt;/strong&gt; AWS throttles &lt;code&gt;AssumeRole&lt;/code&gt; at ~500 requests/second per account. Hit that in production and you'll learn the meaning of &lt;code&gt;AccessDenied&lt;/code&gt; the hard way.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Users expect instant auth.&lt;/strong&gt; Nobody waits 2 seconds for a login to "warm up." If the first click feels slow, trust evaporates.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Redis sits at the intersection of all three: it's fast enough to feel like memory, persistent enough to survive pod restarts (in clustered mode), and simple enough that the caching logic doesn't become its own microservice.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Two Caches, One Redis
&lt;/h2&gt;

&lt;p&gt;We use Redis for two distinct but related caching concerns:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Session Cache (Identity Layer)
&lt;/h3&gt;

&lt;p&gt;When a user logs in (via OIDC), we create a &lt;strong&gt;platform session&lt;/strong&gt; in Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jane.doe@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;roles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TenantId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UseCaseId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc-search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Environment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RoleName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_CASE_DEVELOPER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TenantId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UseCaseId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Environment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RoleName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_CASE_OWNER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;highest_role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_CASE_OWNER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform_roles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_CASE_OWNER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_CASE_DEVELOPER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;  &lt;span class="c1"&gt;# STS credentials are added lazily
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key format:&lt;/strong&gt; &lt;code&gt;session:&amp;lt;uuid&amp;gt;&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;TTL:&lt;/strong&gt; 1 hour (configurable via env)&lt;/p&gt;

&lt;p&gt;This replaces the classic "hit the database on every request" pattern. Once stored, every downstream service validates auth by reading from Redis—not by calling the IdP or querying a user table.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. STS Credential Cache (AWS Access Layer)
&lt;/h3&gt;

&lt;p&gt;When a user accesses a specific tenant/use-case/environment, we call &lt;code&gt;sts:AssumeRole&lt;/code&gt; to get short-lived credentials. These get cached &lt;strong&gt;inside the session object&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acme-corp|doc-search|prod|USE_CASE_DEVELOPER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AccessKeyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASIA...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecretAccessKey&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wJal...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SessionToken&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FwoG...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expiration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-02-28T19:00:00+00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key format (composite):&lt;/strong&gt; &lt;code&gt;TenantId|UseCaseId|Environment|RoleName&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;TTL:&lt;/strong&gt; Derived from credential expiry minus a 5-minute safety buffer&lt;/p&gt;

&lt;p&gt;This means the second time a user touches the same tenant/environment, we skip STS entirely.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Code: Session Storage
&lt;/h2&gt;

&lt;p&gt;Here's the core of how we store a session after successful OIDC login:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;redis.connection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConnectionPool&lt;/span&gt;

&lt;span class="n"&gt;DEFAULT_TTL_SECONDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;  &lt;span class="c1"&gt;# 1 hour
&lt;/span&gt;
&lt;span class="c1"&gt;# Singleton connection pool — one per process
&lt;/span&gt;&lt;span class="n"&gt;_connection_pool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ConnectionPool&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_redis_pool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ConnectionPool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_connection_pool&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_connection_pool&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_connection_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConnectionPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_HOST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_connections&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;decode_responses&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;socket_keepalive&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;socket_connect_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retry_on_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_connection_pool&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_redis_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection_pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_redis_pool&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;highest_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;platform_roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_TTL_SECONDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_redis_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;session_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;roles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;highest_role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;highest_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform_roles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;platform_roles&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RedisError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;setex&lt;/code&gt; instead of &lt;code&gt;set&lt;/code&gt; + &lt;code&gt;expire&lt;/code&gt;?&lt;/strong&gt; Atomicity. If the process crashes between &lt;code&gt;set&lt;/code&gt; and &lt;code&gt;expire&lt;/code&gt;, you get a session that never dies. &lt;code&gt;setex&lt;/code&gt; is a single atomic operation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Code: STS Credential Caching
&lt;/h2&gt;

&lt;p&gt;The real performance win is here—caching the output of &lt;code&gt;sts:AssumeRole&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="n"&gt;sts_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EXPIRATION_BUFFER_SEC&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;  &lt;span class="c1"&gt;# 5 minutes
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_sts_credentials&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;platform_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;force_refresh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Check the cache first
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;force_refresh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_credentials_from_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;platform_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;is_credential_valid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;  &lt;span class="c1"&gt;# 🎯 Cache hit — skip STS entirely
&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 2: Cache miss — call STS
&lt;/span&gt;    &lt;span class="n"&gt;role_arn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolve_role_arn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;platform_role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sts_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assume_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;RoleArn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;RoleSessionName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;DurationSeconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;creds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Credentials&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;credential_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AccessKeyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AccessKeyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecretAccessKey&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecretAccessKey&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SessionToken&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SessionToken&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expiration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;creds&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expiration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Cache with smart TTL (expire before AWS does)
&lt;/span&gt;    &lt;span class="n"&gt;expiration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;credential_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expiration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expiration&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expiration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tzinfo&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;EXPIRATION_BUFFER_SEC&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;store_credentials_in_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;platform_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credential_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;credential_data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;EXPIRATION_BUFFER_SEC = 300&lt;/code&gt; is critical. STS credentials expire at a hard boundary. If you serve a credential that's 10 seconds from death, the downstream AWS call will fail with a confusing &lt;code&gt;ExpiredTokenException&lt;/code&gt;. The 5-minute buffer ensures we always refresh before the cliff.&lt;/p&gt;




&lt;h2&gt;
  
  
  Credential Validity Check
&lt;/h2&gt;

&lt;p&gt;A clean helper that prevents serving stale credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_credential_valid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;expiration_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expiration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;expiration_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

    &lt;span class="n"&gt;expiration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;expiration_str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expiration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tzinfo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;buffer_seconds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expiration&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;buffer_seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the credential is within 5 minutes of expiring, we treat it as expired. Simple, defensive, saves you from debugging &lt;code&gt;ExpiredTokenException&lt;/code&gt; at 3 AM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Session Validation: The Hot Path
&lt;/h2&gt;

&lt;p&gt;Every authenticated API request runs through this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_session_and_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Single Redis GET — sub-millisecond
&lt;/span&gt;    &lt;span class="n"&gt;session_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;session_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Session not found or expired&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;user_email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;roles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;roles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all_roles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;highest_role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;derive_highest_role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Optional: validate specific tenant/use-case access
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;use_case_id&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;matching_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_role_for_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;environment&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;matching_role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No access to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;use_case_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;matching_role&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the difference between "every request takes 200ms to validate" and "every request takes &amp;lt;1ms to validate." The session is already in Redis. The role lookup is a JSON parse + list scan. Done.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Login Flow: Putting It Together
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
  │
  │  GET /auth/userinfo
  ▼
ALB (OIDC authenticate)
  │
  │  verified user → forwarded with OIDC headers
  ▼
Backend Login Handler
  │
  ├─ 1. Decode &amp;amp; verify OIDC token (claims extraction)
  ├─ 2. Map IdP groups → platform roles (7-role hierarchy)
  ├─ 3. Build entitlements (tenant → use_case → env → role)
  ├─ 4. Store session in Redis (session:&amp;lt;uuid&amp;gt;)
  ├─ 5. Return session_id + tenants to frontend
  │
  ▼
Frontend stores session_id
  │
  │  Subsequent API calls include X-Session-Id header
  ▼
Any Backend Service
  │
  ├─ Validate session from Redis (sub-ms)
  ├─ Check role mapping for requested resource
  └─ If STS credentials needed:
       ├─ Check Redis cache first (sub-ms)
       └─ Call STS only on cache miss (~200ms)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first login is the "expensive" one (~500ms total including STS). Every subsequent request benefits from the cache.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connection Pooling: Don't Skip This
&lt;/h2&gt;

&lt;p&gt;A surprisingly common mistake: creating a new Redis connection per request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Don't do this
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# new connection!
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Do this — reuse a connection pool
&lt;/span&gt;&lt;span class="n"&gt;_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConnectionPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_connections&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection_pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_pool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each TCP connection to Redis costs ~1ms to establish. At 1,000 req/s, that's 1 full second of CPU time per second just on handshakes. Connection pooling makes this a non-issue.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability: Know Your Hit Ratio
&lt;/h2&gt;

&lt;p&gt;We track cache operations with Prometheus counters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gauge&lt;/span&gt;

&lt;span class="n"&gt;cache_operations_total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_operations_total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total cache operations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cache_hit_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gauge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_hit_ratio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rolling cache hit ratio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Labels like &lt;code&gt;operation=get_creds&lt;/code&gt; and &lt;code&gt;status=hit|miss|expired|error&lt;/code&gt; let you build dashboards that answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What's our STS cache hit ratio? (target: &amp;gt;85%)&lt;/li&gt;
&lt;li&gt;Which tenants have the most cache misses? (may indicate config drift)&lt;/li&gt;
&lt;li&gt;Are we seeing Redis errors? (time to check cluster health)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your hit ratio drops below 80%, something is wrong—either TTLs are too short, sessions are thrashing, or your Redis instance is under memory pressure.&lt;/p&gt;




&lt;h2&gt;
  
  
  TLS + Secrets Manager: Production Hardening
&lt;/h2&gt;

&lt;p&gt;In production, Redis connections should be encrypted and passwords should never live in env vars:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_load_password_from_secrets_manager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret_arn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load Redis auth token from AWS Secrets Manager.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;sm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secretsmanager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_secret_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SecretId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;secret_arn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecretString&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Support both plain strings and JSON secrets
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authToken&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also cache the fetched secret in-process—no need to call Secrets Manager on every pool initialization. And we configure TLS via the &lt;code&gt;SSLConnection&lt;/code&gt; class from the Redis Python client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;redis.connection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SSLConnection&lt;/span&gt;

&lt;span class="n"&gt;pool_kwargs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;connection_class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SSLConnection&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you in-transit encryption for ElastiCache, which is a compliance checkbox you'd rather check early.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotchas (A.K.A. What Bit Us So It Doesn't Bite You)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Stale Credentials After Role Changes
&lt;/h3&gt;

&lt;p&gt;If a user's role changes (e.g., promoted from &lt;code&gt;USE_CASE_DEVELOPER&lt;/code&gt; to &lt;code&gt;USE_CASE_OWNER&lt;/code&gt;), the cached session still has the old role mappings. Our fix: invalidate the session on role change and force a re-login.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invalidate_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_redis_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Redis Goes Down — What Then?
&lt;/h3&gt;

&lt;p&gt;Redis is fast, but it's not invincible. If the Redis cluster is unreachable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session validation should fail-closed (reject the request, don't silently allow it)&lt;/li&gt;
&lt;li&gt;Log aggressively so ops teams see the outage&lt;/li&gt;
&lt;li&gt;Never fall back to "allow all" — that's a security vulnerability disguised as fault tolerance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Session Key Collisions
&lt;/h3&gt;

&lt;p&gt;Using predictable keys (like &lt;code&gt;session:&amp;lt;user_email&amp;gt;&lt;/code&gt;) opens the door to session hijacking. Use &lt;code&gt;session:&amp;lt;uuid4&amp;gt;&lt;/code&gt; — the session ID should be unguessable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Memory Pressure in Multi-Tenant Environments
&lt;/h3&gt;

&lt;p&gt;Each session stores role mappings for every tenant/use-case the user can access. A platform admin with access to 50 tenants has a bigger session object than a single-tenant end user. Monitor Redis memory usage and set &lt;code&gt;maxmemory-policy&lt;/code&gt; to &lt;code&gt;volatile-lru&lt;/code&gt; so expired keys get evicted first.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Binding Token Replay Attacks
&lt;/h3&gt;

&lt;p&gt;If your auth flow uses one-time binding tokens (e.g., for device code flows), mark them as consumed in Redis with a short TTL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mark_binding_token_consumed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;binding_token:consumed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;get_redis_client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_binding_token_consumed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;binding_token:consumed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get_redis_client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  When You Should &lt;em&gt;Not&lt;/em&gt; Use This Pattern
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single-user apps&lt;/strong&gt; — if you have 10 users, the extra Redis infrastructure isn't worth it. A signed JWT with short expiry is simpler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless-only architectures&lt;/strong&gt; — if your design principle is "no server-side state," Redis sessions are a philosophical violation. (But also: stateless auth at scale has its own costs.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No AWS roles to assume&lt;/strong&gt; — if you're not using STS, the credential caching half of this pattern doesn't apply. The session caching half still might.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Practical Implementation Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Deploy Redis (ElastiCache Serverless or self-managed cluster with replication)&lt;/li&gt;
&lt;li&gt;[ ] Enable TLS in-transit (&lt;code&gt;SSLConnection&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] Store Redis password in Secrets Manager, not env vars&lt;/li&gt;
&lt;li&gt;[ ] Use connection pooling (&lt;code&gt;ConnectionPool&lt;/code&gt; with &lt;code&gt;max_connections&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] Set session TTL to match your security requirements (we use 1 hour)&lt;/li&gt;
&lt;li&gt;[ ] Add 5-minute expiration buffer on STS credential cache&lt;/li&gt;
&lt;li&gt;[ ] Implement &lt;code&gt;health_check()&lt;/code&gt; — ping Redis on startup and expose &lt;code&gt;/health&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Add Prometheus metrics for cache hit/miss/error rates&lt;/li&gt;
&lt;li&gt;[ ] Set &lt;code&gt;maxmemory-policy&lt;/code&gt; to &lt;code&gt;volatile-lru&lt;/code&gt; on the Redis instance&lt;/li&gt;
&lt;li&gt;[ ] Document your invalidation strategy (when do cached sessions get killed?)&lt;/li&gt;
&lt;li&gt;[ ] Test Redis-down scenarios (your app should fail-closed, not fail-open)&lt;/li&gt;
&lt;li&gt;[ ] Load SSM parameters at startup, not import time (env vars must be populated first)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Before Redis caching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Login: ~800ms (OIDC + STS + DB lookups)&lt;/li&gt;
&lt;li&gt;Subsequent API auth: ~200ms per request (session re-validation + STS)&lt;/li&gt;
&lt;li&gt;STS calls: 1 per authenticated request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After Redis caching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Login: ~500ms (OIDC + STS + Redis write — the STS is cached for next time)&lt;/li&gt;
&lt;li&gt;Subsequent API auth: &lt;strong&gt;&amp;lt;1ms&lt;/strong&gt; (Redis GET + JSON parse)&lt;/li&gt;
&lt;li&gt;STS calls: 1 per unique tenant/role/env combination per session lifetime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 10,000 authenticated requests per hour, that's the difference between 10,000 STS calls and ~50. Your AWS bill notices. Your users notice. Your on-call rotation notices.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing: The Fastest Auth Call Is the One You Don't Make
&lt;/h2&gt;

&lt;p&gt;Redis isn't just a cache layer for your database queries. It's the foundation of a fast, secure auth perimeter.&lt;/p&gt;

&lt;p&gt;The session cache eliminates per-request identity lookups. The STS credential cache eliminates per-request IAM calls. Together, they turn your auth layer from a distributed systems problem into a local memory read.&lt;/p&gt;

&lt;p&gt;And when security is fast, developers stop looking for shortcuts around it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your strategy for caching short-lived AWS credentials? Do you cache at the application layer, use credential providers, or something else entirely? Drop a comment — I'm curious what patterns are working for others.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AWS Docs: &lt;a href="https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html" rel="noopener noreferrer"&gt;STS AssumeRole&lt;/a&gt; — rate limits and best practices&lt;/li&gt;
&lt;li&gt;Redis: &lt;a href="https://redis.readthedocs.io/en/stable/connections.html#connection-pools" rel="noopener noreferrer"&gt;Connection Pooling&lt;/a&gt; in the Python client&lt;/li&gt;
&lt;li&gt;AWS ElastiCache: &lt;a href="https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/in-transit-encryption.html" rel="noopener noreferrer"&gt;In-transit encryption&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Prometheus: &lt;a href="https://prometheus.github.io/client_python/" rel="noopener noreferrer"&gt;Client instrumentation for Python&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Suraj Khaitan&lt;/strong&gt; — Gen AI Architect | Building scalable platforms and secure cloud-native systems&lt;/p&gt;

&lt;p&gt;Connect on &lt;a href="https://www.linkedin.com/in/suraj-khaitan-501736a2/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | Follow for more engineering and architecture write-ups&lt;/p&gt;




</description>
      <category>aws</category>
      <category>python</category>
      <category>redis</category>
      <category>ai</category>
    </item>
    <item>
      <title>🔥 We Deleted Our Login Code: ALB OIDC for Serverless Frontends</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sun, 08 Feb 2026 07:01:00 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/we-deleted-our-login-code-alb-oidc-for-serverless-frontends-aok</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/we-deleted-our-login-code-alb-oidc-for-serverless-frontends-aok</guid>
      <description>&lt;p&gt;&lt;em&gt;How moving auth to the load balancer with ALB’s &lt;code&gt;authenticate_oidc&lt;/code&gt; made our UI simpler, our defaults safer, and our incidents rarer&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Day “Just Store the Token” Stopped Being Funny
&lt;/h2&gt;

&lt;p&gt;At some point, every frontend team gets the same suggestion:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Just do OAuth in the browser, store the token, and attach it on API calls.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It works—until it doesn’t.&lt;/p&gt;

&lt;p&gt;Because the moment your UI becomes responsible for &lt;strong&gt;token storage, refresh logic, callback routes, and logout semantics&lt;/strong&gt;, your “frontend” quietly turns into an auth product.&lt;/p&gt;

&lt;p&gt;We fixed this by doing something that feels almost illegal:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We let the load balancer handle the login.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Specifically: &lt;strong&gt;AWS Application Load Balancer (ALB) + &lt;code&gt;authenticate_oidc&lt;/code&gt; + a serverless frontend target (Lambda)&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR (If You Only Read One Section)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; App-level OIDC spreads secrets + token handling across every UI route and runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move:&lt;/strong&gt; Put OIDC at the edge using ALB &lt;code&gt;authenticate_oidc&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Less auth code in the app, fewer token footguns, and a “secure-by-default” perimeter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tradeoff:&lt;/strong&gt; Local dev + logout semantics require intentional design.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Pattern Is Trending Right Now
&lt;/h2&gt;

&lt;p&gt;Across dev communities lately, the popular themes are consistent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Stop overbuilding auth in every app.”&lt;/li&gt;
&lt;li&gt;“Move concerns up the stack.”&lt;/li&gt;
&lt;li&gt;“Make security the default, not a checklist item.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Edge-auth patterns (ALB OIDC, gateway authorizers, access proxies) are having a moment because they reduce the number of places a team can accidentally get auth wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: Token Chaos Isn’t One Bug—It’s a Lifestyle
&lt;/h2&gt;

&lt;p&gt;If you do OIDC inside the frontend, you almost inevitably accumulate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A callback route you must never break&lt;/li&gt;
&lt;li&gt;Token storage debates (&lt;code&gt;localStorage&lt;/code&gt; vs memory vs cookies)&lt;/li&gt;
&lt;li&gt;Refresh token logic (and the day it fails in production)&lt;/li&gt;
&lt;li&gt;“Why did it log me out?” issues&lt;/li&gt;
&lt;li&gt;Security reviews that keep expanding scope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the nastiest part is: it’s not &lt;em&gt;one&lt;/em&gt; critical bug—it’s &lt;strong&gt;a hundred tiny sharp edges&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pivot: Authentication at the ALB
&lt;/h2&gt;

&lt;p&gt;When you use &lt;code&gt;authenticate_oidc&lt;/code&gt;, the ALB becomes the bouncer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unauthenticated requests get redirected to your Identity Provider (IdP)&lt;/li&gt;
&lt;li&gt;The ALB completes the OIDC flow&lt;/li&gt;
&lt;li&gt;The ALB maintains an authenticated session (cookie-based)&lt;/li&gt;
&lt;li&gt;Only authenticated requests reach your target&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your serverless frontend (often a Lambda router / SSR / fallback handler) simply… serves pages.&lt;/p&gt;

&lt;p&gt;The vibe shifts from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did we implement OAuth correctly?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“If I got a 200, I’m logged in.”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Request Flow in 30 Seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
  |
  | GET /anything
  v
ALB (authenticate_oidc)
  |
  | not logged in?
  | 302 -&amp;gt; IdP
  v
IdP (login)
  |
  | 302 -&amp;gt; ALB callback
  v
ALB (sets session cookies)
  |
  | forward
  v
Lambda target (serverless frontend router)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what’s missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No client-side token parsing&lt;/li&gt;
&lt;li&gt;No callback handler in your React app&lt;/li&gt;
&lt;li&gt;No refresh logic scattered across fetch calls&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Minimal, Anonymized CDK Snippet
&lt;/h2&gt;

&lt;p&gt;This is intentionally “shape only” (no real URLs, no org names). The essence is:&lt;/p&gt;

&lt;p&gt;1) forward to a Lambda target group&lt;br&gt;
2) wrap it with &lt;code&gt;authenticate_oidc&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aws_elasticloadbalancingv2&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;elbv2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aws_elasticloadbalancingv2_targets&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;targets&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SecretValue&lt;/span&gt;

&lt;span class="n"&gt;frontend_tg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;elbv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ApplicationTargetGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FrontendTg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elbv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TargetType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LAMBDA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LambdaTarget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frontend_router_lambda&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FrontendWithOidc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;elbv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenerCondition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;path_patterns&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elbv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenerAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;authenticate_oidc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;issuer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://idp.example/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;authorization_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://idp.example/oauth2/authorize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;token_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://idp.example/oauth2/token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_info_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://idp.example/oauth2/userinfo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;client-id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SecretValue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;secrets_manager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/path/to/oidc-secret&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elbv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenerAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;frontend_tg&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Quick rules that save pain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep the OIDC secret in a secret manager, not env vars.&lt;/li&gt;
&lt;li&gt;Make sure listener priorities don’t collide.&lt;/li&gt;
&lt;li&gt;Default to protecting &lt;code&gt;/*&lt;/code&gt; unless you truly want public routes.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How This Changed Our Security Posture (In Plain English)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) “Secure by default” stops being a slogan
&lt;/h3&gt;

&lt;p&gt;With ALB OIDC, every path behind the listener rule becomes authenticated by default. You’re no longer relying on every route guard, every component, and every refactor to “remember auth.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Less token exposure in the browser
&lt;/h3&gt;

&lt;p&gt;The browser is a hostile environment. Reducing token handling in the UI reduces your exposure to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XSS turning into token theft&lt;/li&gt;
&lt;li&gt;accidental logging of sensitive values&lt;/li&gt;
&lt;li&gt;copy-paste auth bugs across micro-frontends&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3) Fewer app secrets
&lt;/h3&gt;

&lt;p&gt;If your frontend app doesn’t need to “be an OAuth client,” it also needs fewer secrets and fewer complicated deployment rules.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Subtle but Important Split: Auth vs Authorization
&lt;/h2&gt;

&lt;p&gt;ALB OIDC is excellent at &lt;strong&gt;authentication&lt;/strong&gt; (“who are you?”).&lt;/p&gt;

&lt;p&gt;But you still need strong &lt;strong&gt;authorization&lt;/strong&gt; (“what can you do?”):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RBAC: role-based permissions&lt;/li&gt;
&lt;li&gt;ABAC: tenant/env/resource scoping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The clean division:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ALB:&lt;/strong&gt; verify the user is logged in&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; enforce permissions and data scope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you try to do all authorization at the load balancer, you’ll end up with something brittle and hard to evolve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotchas (A.K.A. The Part Everyone Learns in Production)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Callback path behavior
&lt;/h3&gt;

&lt;p&gt;ALB uses a callback endpoint (often something like &lt;code&gt;/oauth2/idpresponse&lt;/code&gt;). Make sure your routing rules don’t accidentally break it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Claims can get huge
&lt;/h3&gt;

&lt;p&gt;Too many groups/roles/claims can hit header/cookie limits. Mitigations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep tokens/claims lean&lt;/li&gt;
&lt;li&gt;fetch richer profile data server-side&lt;/li&gt;
&lt;li&gt;store heavy identity in your own session store&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3) Logout is three separate things
&lt;/h3&gt;

&lt;p&gt;There’s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;app logout&lt;/li&gt;
&lt;li&gt;ALB session cookie&lt;/li&gt;
&lt;li&gt;IdP session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Define what “Logout” means for your UX and compliance requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Local dev can feel weird
&lt;/h3&gt;

&lt;p&gt;Production has ALB OIDC; your laptop doesn’t.&lt;/p&gt;

&lt;p&gt;Good local-dev patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inject mocked identity headers in dev&lt;/li&gt;
&lt;li&gt;run a lightweight local gateway that simulates “auth at the edge”&lt;/li&gt;
&lt;li&gt;keep backend authorization testable without a real IdP&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Practical Rollout Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Verify OIDC endpoints: issuer + authorize + token + userinfo&lt;/li&gt;
&lt;li&gt;Store the client secret in a secret manager&lt;/li&gt;
&lt;li&gt;Confirm listener rule priority ordering&lt;/li&gt;
&lt;li&gt;Ensure callback path is reachable through routing rules&lt;/li&gt;
&lt;li&gt;Enforce HTTPS everywhere&lt;/li&gt;
&lt;li&gt;Enable ALB access logs&lt;/li&gt;
&lt;li&gt;Document logout behavior (what it clears)&lt;/li&gt;
&lt;li&gt;Write down the local-dev story (seriously)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When You Should &lt;em&gt;Not&lt;/em&gt; Use ALB OIDC
&lt;/h2&gt;

&lt;p&gt;Avoid / reconsider if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you need complex per-request authorization decisions before forwarding&lt;/li&gt;
&lt;li&gt;you don’t have an ALB in the request path (pure CDN with no origin auth)&lt;/li&gt;
&lt;li&gt;your org mandates a different gateway or zero-trust access layer&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing: Make the Safe Path the Easy Path
&lt;/h2&gt;

&lt;p&gt;The benefit of this pattern isn’t novelty.&lt;/p&gt;

&lt;p&gt;It’s that you can remove an entire category of mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;less auth code in the UI&lt;/li&gt;
&lt;li&gt;fewer ways to leak tokens&lt;/li&gt;
&lt;li&gt;consistent enforcement across routes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And when security is the default, teams move faster—because fewer changes require “special auth handling.”&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you’ve done edge auth (ALB OIDC, gateway authorizers, access proxies), what hurt most for you: local dev, logout, or claim size?&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AWS Docs: Application Load Balancer authentication actions (OIDC)&lt;/li&gt;
&lt;li&gt;AWS CDK: &lt;code&gt;ListenerAction.authenticate_oidc&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;OAuth 2.0 / OIDC basics (for understanding redirects, authorization code flow)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Suraj Khaitan&lt;/strong&gt; — Gen AI Architect | Building scalable platforms and secure cloud-native systems&lt;/p&gt;

&lt;p&gt;Connect on &lt;a href="https://www.linkedin.com/in/suraj-khaitan-501736a2/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | Follow for more engineering and architecture write-ups&lt;/p&gt;




</description>
      <category>ai</category>
      <category>aws</category>
      <category>oauth</category>
      <category>agents</category>
    </item>
    <item>
      <title>🧠 RAG in 2026: A Practical Blueprint for Retrieval-Augmented Generation</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sun, 25 Jan 2026 06:20:17 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/-rag-in-2026-a-practical-blueprint-for-retrieval-augmented-generation-16pp</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/-rag-in-2026-a-practical-blueprint-for-retrieval-augmented-generation-16pp</guid>
      <description>&lt;p&gt;&lt;em&gt;How to make LLMs feel “grounded” in your data—without turning your app into a prompt-factory.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Large Language Models are incredible at &lt;em&gt;language&lt;/em&gt;, but they still have two awkward traits in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;They don’t know your private data by default&lt;/strong&gt; (docs, tickets, code, policies).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They can sound confident even when they’re guessing.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) is the most reliable pattern I’ve used to fix both—by giving the model &lt;em&gt;just-in-time&lt;/em&gt; access to relevant context at the moment it answers.&lt;/p&gt;

&lt;p&gt;This post is a practical, medium-depth tour of RAG: the core architecture, the failure modes, and the “advanced knobs” that actually move quality (reranking, routing, query strategies, and better indexing). I’ll also point you to a great open-source reference implementation that I’ve been using as a sanity check.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔎 The Core Idea: Don’t Train, Retrieve
&lt;/h2&gt;

&lt;p&gt;Think of RAG as two systems working together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retriever:&lt;/strong&gt; finds the best supporting context for a question.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generator (LLM):&lt;/strong&gt; writes the final answer using the retrieved context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of trying to cram your entire knowledge base into model weights, you keep your knowledge in stores that are good at search (vector DBs, relational DBs, graph DBs), retrieve the best bits, and then let the LLM do what it does best: compose a response.&lt;/p&gt;

&lt;p&gt;A good mental model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;RAG = Search + Reasoning&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Search brings &lt;em&gt;facts&lt;/em&gt;. Reasoning provides &lt;em&gt;coherence&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ A Clean RAG Architecture (What Actually Matters)
&lt;/h2&gt;

&lt;p&gt;Most RAG diagrams look complex because they include every optional component. Here’s a simple backbone that scales:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingest&lt;/strong&gt; documents (PDFs, web pages, internal wikis, tickets)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk&lt;/strong&gt; them into retrievable units&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed&lt;/strong&gt; chunks into vectors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index&lt;/strong&gt; vectors in a vector store&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve&lt;/strong&gt; top-$k$ chunks for a question&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt; an answer with citations / grounded context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In code, the minimal version feels like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;question -&amp;gt; embed(question) -&amp;gt; similarity_search -&amp;gt; context -&amp;gt; LLM(prompt + context)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you only build that, you’ll get something working quickly—but you’ll also quickly hit the real-world issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval returns “nearby” chunks that don’t actually answer the question&lt;/li&gt;
&lt;li&gt;The best chunk is buried at rank 17&lt;/li&gt;
&lt;li&gt;A single query phrasing misses the right terminology&lt;/li&gt;
&lt;li&gt;Some questions should query SQL or a graph, not embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where the next layers matter.&lt;/p&gt;




&lt;h2&gt;
  
  
  📦 Retrieval Isn’t Only Vectors: Pick the Right Store
&lt;/h2&gt;

&lt;p&gt;A mature RAG system doesn’t have to be “vector-only”. Depending on the question, retrieval can come from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector stores:&lt;/strong&gt; semantic search over unstructured text (docs, emails, transcripts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relational DBs:&lt;/strong&gt; exact structured facts (orders, users, pricing, logs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph DBs:&lt;/strong&gt; relationships and traversals (org charts, dependency graphs, knowledge graphs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, you often end up with a hybrid:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data type&lt;/th&gt;
&lt;th&gt;Best retrieval style&lt;/th&gt;
&lt;th&gt;Example question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Policies / long docs&lt;/td&gt;
&lt;td&gt;Vector search&lt;/td&gt;
&lt;td&gt;“What’s our parental leave policy?”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metrics / records&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;“What was churn last quarter in EMEA?”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Relationships&lt;/td&gt;
&lt;td&gt;Cypher / graph queries&lt;/td&gt;
&lt;td&gt;“Who owns service X and what depends on it?”&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is why modern RAG stacks include things like &lt;strong&gt;Text-to-SQL&lt;/strong&gt;, &lt;strong&gt;Text-to-Cypher&lt;/strong&gt;, and &lt;strong&gt;self-query retrievers&lt;/strong&gt; (where the model generates a structured search query and metadata filters).&lt;/p&gt;




&lt;h2&gt;
  
  
  🧭 Routing: The “Secret Sauce” for Multi-Source RAG
&lt;/h2&gt;

&lt;p&gt;If you only have one data source, retrieval is straightforward. But the moment you add a relational database, a vector store, and maybe a graph—your first big design decision becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do I route a user’s question to the right retriever?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two patterns show up repeatedly:&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Logical routing
&lt;/h3&gt;

&lt;p&gt;Simple rules or a lightweight classifier.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“If the question mentions revenue, query SQL.”&lt;/li&gt;
&lt;li&gt;“If the question mentions ‘policy’, use the handbook index.”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2) Semantic routing
&lt;/h3&gt;

&lt;p&gt;Use embeddings (or a small LLM prompt) to decide which tool to call.&lt;/p&gt;

&lt;p&gt;This reduces “tool spam” and usually improves relevance because you retrieve from the &lt;em&gt;right&lt;/em&gt; store first.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Query Strategies That Increase Recall (Without Overfetching)
&lt;/h2&gt;

&lt;p&gt;Most weak RAG answers are not generation problems—they’re retrieval problems.&lt;/p&gt;

&lt;p&gt;A single user question is often ambiguous. Strong pipelines expand the query space &lt;em&gt;before&lt;/em&gt; retrieving.&lt;/p&gt;

&lt;p&gt;Here are query strategies I’ve seen consistently help:&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-query
&lt;/h3&gt;

&lt;p&gt;Generate multiple paraphrases of the question and retrieve for each.&lt;/p&gt;

&lt;p&gt;Why it works: different phrasing hits different vocabulary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step-back questions
&lt;/h3&gt;

&lt;p&gt;Ask a higher-level sub-question first (“What concept is this about?”), then use that to retrieve.&lt;/p&gt;

&lt;p&gt;Why it works: reduces lexical mismatch and anchors retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  HyDE (Hypothetical Document Embeddings)
&lt;/h3&gt;

&lt;p&gt;Generate a &lt;em&gt;hypothetical&lt;/em&gt; answer document, embed that, and retrieve based on it.&lt;/p&gt;

&lt;p&gt;Why it works: the hypothetical answer contains domain language the user may not use.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG-Fusion
&lt;/h3&gt;

&lt;p&gt;Retrieve multiple lists (from multi-query, HyDE, etc.) and then &lt;strong&gt;fuse&lt;/strong&gt; rankings (often using Reciprocal Rank Fusion).&lt;/p&gt;

&lt;p&gt;Why it works: you get strong recall without blindly increasing $k$.&lt;/p&gt;




&lt;h2&gt;
  
  
  🥇 Reranking: Fix “The Answer Was in the Context, But…”
&lt;/h2&gt;

&lt;p&gt;If you’ve built a basic RAG system, you’ve likely seen this failure mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The right chunk is retrieved&lt;/li&gt;
&lt;li&gt;But it’s ranked too low&lt;/li&gt;
&lt;li&gt;The LLM focuses on the wrong chunk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reranking is the clean fix.&lt;/p&gt;

&lt;p&gt;A common pipeline looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve top 20–50 chunks cheaply (vector similarity)&lt;/li&gt;
&lt;li&gt;Rerank top candidates with a stronger model (cross-encoder, LLM-based ranker, or a reranker API)&lt;/li&gt;
&lt;li&gt;Feed the top 3–8 chunks to the generator&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You’ll see reranking approaches referenced as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cross-encoder rerankers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM ranking&lt;/strong&gt; (sometimes called RankGPT-style ranking)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RRF&lt;/strong&gt; (Reciprocal Rank Fusion) when merging multiple retrieval lists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of the highest ROI upgrades in RAG.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧹 Filter &amp;amp; Compress: The Missing Piece for Long Context
&lt;/h2&gt;

&lt;p&gt;Even if retrieval is good, the final prompt can still be noisy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated information&lt;/li&gt;
&lt;li&gt;irrelevant paragraphs&lt;/li&gt;
&lt;li&gt;chunks that overlap heavily&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where &lt;strong&gt;contextual compression&lt;/strong&gt; comes in: after retrieval, you summarize, extract, or filter down to only what matters.&lt;/p&gt;

&lt;p&gt;This is especially important as your data grows and you start using larger $k$ values.&lt;/p&gt;




&lt;h2&gt;
  
  
  🗂️ Indexing: Where Most Teams Underinvest
&lt;/h2&gt;

&lt;p&gt;Indexing decisions quietly determine your ceiling.&lt;/p&gt;

&lt;p&gt;Here are indexing techniques worth knowing (and testing):&lt;/p&gt;

&lt;h3&gt;
  
  
  Chunk optimization
&lt;/h3&gt;

&lt;p&gt;Chunk size is not a constant. Different document types want different chunking.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Too small → context fragments&lt;/li&gt;
&lt;li&gt;Too large → retrieval becomes “blurry”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Semantic splitting
&lt;/h3&gt;

&lt;p&gt;Split on meaning (headings, sections), not arbitrary character counts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parent-document retrieval
&lt;/h3&gt;

&lt;p&gt;Store embeddings for child chunks but return a larger “parent” span when answering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-representation indexing
&lt;/h3&gt;

&lt;p&gt;Index both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fine-grained chunks for precision&lt;/li&gt;
&lt;li&gt;summaries for recall&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Specialized embeddings / fine-tuning
&lt;/h3&gt;

&lt;p&gt;If your domain has unique language (legal, medicine, internal code), embeddings matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hierarchical indexing (RAPTOR-like)
&lt;/h3&gt;

&lt;p&gt;Build a tree of summaries from leaves → root so retrieval can happen at multiple abstraction levels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token-level retrieval (ColBERT-style)
&lt;/h3&gt;

&lt;p&gt;A stronger retrieval approach when semantics are subtle and bag-of-vector similarity struggles.&lt;/p&gt;

&lt;p&gt;You don’t need all of these. But the point is: &lt;strong&gt;RAG quality is frequently an indexing problem disguised as an LLM problem.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔁 Active Retrieval (and Why It’s the Future)
&lt;/h2&gt;

&lt;p&gt;Some questions require the system to &lt;em&gt;work&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ask clarifying questions&lt;/li&gt;
&lt;li&gt;reformulate queries mid-flight&lt;/li&gt;
&lt;li&gt;retry retrieval when evidence is weak&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’ll sometimes see this category described as &lt;strong&gt;active retrieval&lt;/strong&gt; (including approaches like CRAG / self-correcting retrieval patterns).&lt;/p&gt;

&lt;p&gt;The takeaway: the best RAG systems aren’t one-shot. They behave more like a careful researcher.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 A Hands-On Reference: bRAG-langchain
&lt;/h2&gt;

&lt;p&gt;If you want something concrete to learn from (and compare against your own implementation), I recommend checking out the open-source project here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/bRAGAI/bRAG-langchain/" rel="noopener noreferrer"&gt;https://github.com/bRAGAI/bRAG-langchain/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I like about it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It walks from baseline RAG → multi-query → routing → advanced indexing → reranking&lt;/li&gt;
&lt;li&gt;It’s notebook-driven, so you can test ideas quickly&lt;/li&gt;
&lt;li&gt;It keeps the focus on practical patterns (not just theory)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A suggested learning path mirrors the notebook sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Baseline RAG setup&lt;/li&gt;
&lt;li&gt;Multi-query improvements&lt;/li&gt;
&lt;li&gt;Routing + query construction&lt;/li&gt;
&lt;li&gt;Advanced indexing&lt;/li&gt;
&lt;li&gt;Retrieval + reranking + fusion&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Use it like a “cookbook”: borrow the &lt;em&gt;ideas&lt;/em&gt;, not the exact words.&lt;/p&gt;




&lt;h2&gt;
  
  
  👨‍💻 Code Walkthrough (Inspired by bRAG-langchain)
&lt;/h2&gt;

&lt;p&gt;Below are two &lt;em&gt;rewritten&lt;/em&gt; snippets inspired by the project’s notebooks (especially &lt;code&gt;full_basic_rag.ipynb&lt;/code&gt;). The goal is to show the shape of a clean RAG pipeline—without dumping an entire notebook into a blog post.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Attribution: the reference implementation that inspired these patterns is &lt;strong&gt;bRAG AI&lt;/strong&gt;: &lt;a href="https://github.com/bRAGAI/bRAG-langchain/" rel="noopener noreferrer"&gt;https://github.com/bRAGAI/bRAG-langchain/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  1) A minimal LangChain RAG chain (loader → chunks → vectors → retriever → chain)
&lt;/h3&gt;

&lt;p&gt;This is the “boring baseline” that should work before you touch reranking, routing, or fancy indexing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PyPDFLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PineconeVectorStore&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.runnables&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;


&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# expects OPENAI_API_KEY, PINECONE_INDEX_NAME, etc.
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;join_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# 1) Load
&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PyPDFLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/your.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 2) Chunk
&lt;/span&gt;&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3) Embed + index
&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PineconeVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-large&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PINECONE_INDEX_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4) Retrieve
&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# 5) Generate
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a grounded assistant. Use ONLY the context to answer.

Context:
{context}

Question: {question}

If the answer is not in the context, say you don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;rag_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;join_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rag_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is this document about?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this pattern is nice: retrieval is a pure function of the question, and prompt+LLM are pure functions of &lt;code&gt;{context, question}&lt;/code&gt;. That separation makes it easy to add routing, reranking, eval, caching, etc.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Multi-query + fusion (high recall without blindly increasing k)
&lt;/h3&gt;

&lt;p&gt;The repo’s later notebooks explore multi-query / fusion and reranking. The key mental model is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate multiple query variants&lt;/li&gt;
&lt;li&gt;retrieve for each&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;fuse&lt;/em&gt; the ranked lists (so strong hits bubble up)&lt;/li&gt;
&lt;li&gt;optionally rerank the merged set&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s a compact sketch using &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rrf_fuse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ranked_lists&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fuse multiple ranked lists using Reciprocal Rank Fusion.

    ranked_lists: list[list[Document]]
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;by_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked_lists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Prefer a stable ID if you have one; fallback to content hash
&lt;/span&gt;            &lt;span class="n"&gt;doc_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;by_id&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;
            &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;fused&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;by_id&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fused&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# In practice: use an LLM prompt to produce 3–8 diverse rewrites.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; with concrete examples&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the key concepts behind: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How does RAG reduce hallucinations?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ranked_lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;fused_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rrf_fuse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ranked_lists&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# or rebuild chain to use fused_docs
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production you’d typically rebuild the chain so the “context” comes from &lt;code&gt;fused_docs&lt;/code&gt; (and then optionally apply a learned reranker like Cohere Rerank on that smaller candidate set).&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ A Production Checklist (Short, but Useful)
&lt;/h2&gt;

&lt;p&gt;Before you ship RAG to real users, make sure you can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation:&lt;/strong&gt; How will you measure grounded correctness (not just fluency)?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citations:&lt;/strong&gt; Can you show &lt;em&gt;which sources&lt;/em&gt; supported the answer?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallbacks:&lt;/strong&gt; What happens when retrieval confidence is low?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Are you filtering sensitive docs by user permissions before retrieval?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Freshness:&lt;/strong&gt; How often is the index updated? (and can you delete data reliably?)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; Can you keep response time acceptable with reranking and multi-query?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;RAG isn’t a single technique—it’s a toolbox:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval across the right stores&lt;/li&gt;
&lt;li&gt;routing to the right tool&lt;/li&gt;
&lt;li&gt;smarter query generation (multi-query, step-back, HyDE)&lt;/li&gt;
&lt;li&gt;reranking and fusion&lt;/li&gt;
&lt;li&gt;compression for long context&lt;/li&gt;
&lt;li&gt;indexing strategies that scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you get retrieval right, generation becomes the easy part.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;bRAG LangChain project (hands-on notebooks): &lt;a href="https://github.com/bRAGAI/bRAG-langchain/" rel="noopener noreferrer"&gt;https://github.com/bRAGAI/bRAG-langchain/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;RAG architecture diagram source material: see &lt;a href="//RAG_Consolidated.jpg"&gt;RAG_Consolidated.jpg&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Suraj Khaitan&lt;/strong&gt; — Gen AI Architect | Building the next generation of AI-powered development tools&lt;/p&gt;

&lt;p&gt;Connect on &lt;a href="https://www.linkedin.com/in/suraj-khaitan-501736a2/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | Follow for more AI and software engineering insights&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #AI #RAG #LLM #LangChain #VectorDatabases #InformationRetrieval #GenerativeAI&lt;/p&gt;

</description>
      <category>rag</category>
      <category>agents</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Inside Google Antigravity: How AI Pair Programming Actually Works</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sun, 25 Jan 2026 02:18:37 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/inside-google-antigravity-how-ai-pair-programming-actually-works-16nc</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/inside-google-antigravity-how-ai-pair-programming-actually-works-16nc</guid>
      <description>&lt;p&gt;&lt;em&gt;A deep dive into the architecture, capabilities, and real-world coding experience with Google's revolutionary AI coding assistant&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Promise of AI-Powered Development
&lt;/h2&gt;

&lt;p&gt;Imagine having a senior engineer who knows every programming language, can refactor your entire codebase in seconds, understands design patterns across frameworks, and never gets tired. That's the promise of Google Antigravity—but how does it actually work in practice?&lt;/p&gt;

&lt;p&gt;I recently spent over 10 hours building a complete MacOS desktop simulation in React, including a functional Safari browser, Twitter clone, Spotify player, and even a working Flappy Bird game—all with Antigravity as my pair programmer. This article breaks down the architecture, capabilities, and lessons learned from pushing this AI assistant to its limits.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 The Architecture: Three Core Layers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;The Reasoning Engine: Claude 4.5 Sonnet&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At its core, Antigravity uses Anthropic's Claude 4.5 Sonnet with extended "thinking" capabilities. Unlike traditional code completion tools, Antigravity doesn't just autocomplete—it &lt;em&gt;reasons&lt;/em&gt; about your entire codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What makes it special:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;200K token context window&lt;/strong&gt;: Can understand entire codebases, not just snippets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic behavior&lt;/strong&gt;: Plans, executes, verifies, and iterates autonomously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool use&lt;/strong&gt;: Direct filesystem access, browser control, terminal commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn conversations&lt;/strong&gt;: Maintains context across hours of development&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;The Tool Layer: Direct System Access&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is where Antigravity diverges from chat-based AI. It can:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Read and modify files&lt;/span&gt;
&lt;span class="nf"&gt;view_file&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;AbsolutePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/path/to/component.jsx&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;replace_file_content&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
    &lt;span class="na"&gt;TargetFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/path/to/component.jsx&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;TargetContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;old code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;ReplacementContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;new code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Execute terminal commands&lt;/span&gt;
&lt;span class="nf"&gt;run_command&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
    &lt;span class="na"&gt;CommandLine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;npm run build&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;SafeToAutoRun&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; 
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Control the browser&lt;/span&gt;
&lt;span class="nf"&gt;browser_subagent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
    &lt;span class="na"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Navigate to localhost and take screenshot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Search the web for real-time data&lt;/span&gt;
&lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Virat Kohli recent statistics 2026&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real example from my session:&lt;/strong&gt; When I asked for "real data" about cricket players, Antigravity:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Searched the web for current statistics&lt;/li&gt;
&lt;li&gt;Found Virat Kohli's actual tweets from January 2026&lt;/li&gt;
&lt;li&gt;Downloaded real images from news sources&lt;/li&gt;
&lt;li&gt;Updated the entire app with live data—all autonomously&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;The Task Management System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Antigravity doesn't just execute commands—it &lt;em&gt;manages projects&lt;/em&gt;. It uses a sophisticated task boundary system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# task.md (Auto-generated)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; [x] Desktop Environment
&lt;span class="p"&gt;    -&lt;/span&gt; [x] Create &lt;span class="sb"&gt;`Window`&lt;/span&gt; component (Draggable, Controls)
&lt;span class="p"&gt;    -&lt;/span&gt; [x] Create &lt;span class="sb"&gt;`MenuBar`&lt;/span&gt; component
&lt;span class="p"&gt;    -&lt;/span&gt; [x] Create &lt;span class="sb"&gt;`Dock`&lt;/span&gt; component
&lt;span class="p"&gt;-&lt;/span&gt; [/] App Adaptation
&lt;span class="p"&gt;    -&lt;/span&gt; [x] Update Spotify for Desktop
&lt;span class="p"&gt;    -&lt;/span&gt; [ ] Update Twitter for Desktop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;[/]&lt;/code&gt; notation indicates "in progress"—the AI tracks its own state across the conversation.&lt;/p&gt;




&lt;h2&gt;
  
  
  📄 Real-World Example: The MacOS Desktop Build
&lt;/h2&gt;

&lt;p&gt;Let me walk through how Antigravity handled a complex, evolving requirement.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Initial Request&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Build an iPhone UI simulator with Spotify, Flappy Bird, and Twitter apps"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 1: Planning Mode&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Antigravity created an &lt;code&gt;implementation_plan.md&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Proposed Changes&lt;/span&gt;

&lt;span class="gu"&gt;### OS Components&lt;/span&gt;
&lt;span class="gu"&gt;#### [NEW] IPhoneFrame.jsx&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Notch, rounded corners, status bar
&lt;span class="p"&gt;-&lt;/span&gt; Home bar for navigation
&lt;span class="p"&gt;-&lt;/span&gt; App switching logic

&lt;span class="gu"&gt;#### [NEW] HomeScreen.jsx  &lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; App grid with icons
&lt;span class="p"&gt;-&lt;/span&gt; Glassmorphism dock

&lt;span class="gu"&gt;### Apps&lt;/span&gt;
&lt;span class="gu"&gt;#### [NEW] SpotifyApp.jsx&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Player UI with progress bar
&lt;span class="p"&gt;-&lt;/span&gt; Playlist view
&lt;span class="p"&gt;-&lt;/span&gt; Mock playback logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It then &lt;strong&gt;requested approval&lt;/strong&gt; before writing any code. This human-in-the-loop design prevents wasted effort.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 2: Execution Mode&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Once approved, Antigravity wrote 2,000+ lines of React code across 15 files in minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example: Auto-generated Window Component&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onClose&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;zIndex&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;motion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;
      &lt;span class="na"&gt;drag&lt;/span&gt;
      &lt;span class="na"&gt;dragMomentum&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;zIndex&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"absolute bg-[#1e1e1e] rounded-xl shadow-2xl"&lt;/span&gt;
    &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"h-[38px] bg-[#2a2a2a] flex items-center px-4"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="cm"&gt;/* Traffic light controls */&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"flex gap-2 group"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;onClose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; 
               &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"w-3 h-3 rounded-full bg-[#ff5f57]"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;X&lt;/span&gt; &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"opacity-0 group-hover:opacity-100"&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"flex-1 text-center text-xs"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"flex-1 overflow-hidden"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;motion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 3: The Plot Twist&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Midway through, I changed my mind:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Actually, make it MacOS instead of iPhone"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What happened next was remarkable:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Antigravity didn't just rename variables. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identified architectural incompatibilities (mobile touch → desktop mouse)&lt;/li&gt;
&lt;li&gt;Updated the entire windowing system to support dragging&lt;/li&gt;
&lt;li&gt;Replaced the home screen with a desktop + dock&lt;/li&gt;
&lt;li&gt;Converted mobile apps to desktop windows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserved all the existing app logic&lt;/strong&gt;—zero rework&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The transition took ~5 minutes and 30 file modifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 4: The "It Looks Dull" Crisis&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;User feedback:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"This is how stupid it looks currently in my laptop, how dumb you are"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;🔴 &lt;strong&gt;Critical error discovered:&lt;/strong&gt; Tailwind CSS wasn't installed!&lt;/p&gt;

&lt;p&gt;All the styling classes (&lt;code&gt;bg-black&lt;/code&gt;, &lt;code&gt;flex&lt;/code&gt;, &lt;code&gt;rounded-xl&lt;/code&gt;) were being rendered as plain HTML. Antigravity:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Diagnosed the issue from the screenshot&lt;/li&gt;
&lt;li&gt;Installed &lt;code&gt;tailwindcss&lt;/code&gt;, &lt;code&gt;postcss&lt;/code&gt;, &lt;code&gt;autoprefixer&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Created config files (&lt;code&gt;tailwind.config.js&lt;/code&gt;, &lt;code&gt;postcss.config.js&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Switched to the newer &lt;code&gt;@tailwindcss/postcss&lt;/code&gt; plugin when the first attempt failed&lt;/li&gt;
&lt;li&gt;Rebuilt and verified the fix&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Even AI makes mistakes, but it course-corrects &lt;em&gt;fast&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ The Self-Healing Workflow
&lt;/h2&gt;

&lt;p&gt;Here's what surprised me most: &lt;strong&gt;Antigravity debugs itself&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: The Photo App Failure
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;User complaint:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"No photos of Virat Kohli, videos not working, X not working"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Antigravity's response:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Search for real data:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Virat Kohli recent photos images 2026&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Virat Kohli Twitter tweets recent 2026&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extract URLs from search results:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://pbs.twimg.com/media/GiC7zqMWsAA0RJL?format=jpg&amp;amp;name=large&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://akm-img-a-in.tosshub.com/indiatoday/images/story/202401/...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// Real sources from news articles&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Update the component with error handling:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt; 
  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; 
  &lt;span class="na"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://via.placeholder.com/400?text=Virat+Kohli&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Switch video player to YouTube embed:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;iframe&lt;/span&gt;
  &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"100%"&lt;/span&gt;
  &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"100%"&lt;/span&gt;
  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"https://www.youtube.com/embed/[videoId]"&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"autoplay; encrypted-media"&lt;/span&gt;
  &lt;span class="na"&gt;allowFullScreen&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of this happened &lt;strong&gt;without me specifying how to fix it&lt;/strong&gt;. The AI inferred the problems, researched solutions, and implemented them.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎨 The "Real World Data" Challenge
&lt;/h2&gt;

&lt;p&gt;One unique aspect of this session: I demanded &lt;strong&gt;real data&lt;/strong&gt;, not lorem ipsum placeholders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ask:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Get the data from real world on Lauren Bell or Virat Kohli"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What Antigravity did:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Live web search&lt;/strong&gt; for current cricket statistics (January 2026)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extracted structured data:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;odi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;311&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;14797&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;average&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;58.72&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;centuries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;54&lt;/span&gt;  &lt;span class="c1"&gt;// Real number from ICC&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Found actual tweets:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Congratulations @NSaina on a legendary career...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Jan 23&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;likes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;156K&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;// Actual engagement numbers&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sourced real images&lt;/strong&gt; from Wikipedia, news outlets, and social media&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result? A Twitter clone showing Virat Kohli's &lt;em&gt;actual&lt;/em&gt; January 2026 timeline, not synthetic mock data.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔧 The Technical Challenges It Solved
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Tailwind CSS Configuration Hell
&lt;/h3&gt;

&lt;p&gt;Most developers spend hours debugging Tailwind setup. Antigravity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installed dependencies via npm&lt;/li&gt;
&lt;li&gt;Generated config files with proper ES module syntax&lt;/li&gt;
&lt;li&gt;Switched to &lt;code&gt;@tailwindcss/postcss&lt;/code&gt; when v3 syntax failed&lt;/li&gt;
&lt;li&gt;Verified the build (CSS jumped from 1.17KB → 18.21KB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; ~2 hours of Stack Overflow research&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 2: State Management Across Windows
&lt;/h3&gt;

&lt;p&gt;The MacOS desktop needed to track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Window positions (x, y coordinates)&lt;/li&gt;
&lt;li&gt;Z-index stacking order&lt;/li&gt;
&lt;li&gt;Focus state&lt;/li&gt;
&lt;li&gt;Open/closed status&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Antigravity's solution was elegant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;windows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setWindows&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;({});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;maxZIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setMaxZIndex&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;focusWindow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;setWindows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nextZ&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;maxZIndex&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;setMaxZIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nextZ&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nx"&gt;newState&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="na"&gt;zIndex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;nextZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;isActive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="c1"&gt;// Deactivate others&lt;/span&gt;
    &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newState&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;newState&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;isActive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;newState&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No memory leaks, proper immutability, clean logic—first attempt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 3: Framer Motion Animations
&lt;/h3&gt;

&lt;p&gt;Creating smooth iOS-style animations requires understanding physics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;motion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;
  &lt;span class="na"&gt;initial&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;animate&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;spring&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;damping&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;stiffness&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="cm"&gt;/* App content */&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;motion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Antigravity nailed the spring physics parameters on the first try. That's hundreds of hours of animation experience distilled into working code.&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 Key Insights: How to Work with Antigravity
&lt;/h2&gt;

&lt;p&gt;After 10+ hours, here's what I learned:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ &lt;strong&gt;Do:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start broad, iterate narrow:&lt;/strong&gt; "Build a MacOS desktop" → "Add real Virat Kohli data"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give honest feedback:&lt;/strong&gt; When I said "this looks stupid," it fixed the actual problem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let it research:&lt;/strong&gt; It found better data sources than I would have Googled&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust the task breakdown:&lt;/strong&gt; The auto-generated task.md was better organized than my mental model&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ❌ &lt;strong&gt;Don't:&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Micromanage implementation:&lt;/strong&gt; It knows Tailwind/Framer Motion better than most humans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assume it knows your aesthetic:&lt;/strong&gt; "Dull" was subjective—I had to specify &lt;em&gt;what&lt;/em&gt; was missing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip the planning phase:&lt;/strong&gt; Approving &lt;code&gt;implementation_plan.md&lt;/code&gt; saves rework&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🚀 The Broader Implications
&lt;/h2&gt;

&lt;p&gt;This isn't just a better autocomplete. Antigravity represents a fundamental shift:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional Coding&lt;/th&gt;
&lt;th&gt;Antigravity Coding&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Write line-by-line&lt;/td&gt;
&lt;td&gt;Describe outcomes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debug syntax errors&lt;/td&gt;
&lt;td&gt;Solve logic problems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Think about &lt;em&gt;how&lt;/em&gt; to implement&lt;/td&gt;
&lt;td&gt;Think about &lt;em&gt;what&lt;/em&gt; to build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spend hours on config&lt;/td&gt;
&lt;td&gt;Spend hours on design&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck shifts from syntax to ideas.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Means for Developers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Junior developers:&lt;/strong&gt; Can build production-quality apps without years of framework expertise&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Senior developers:&lt;/strong&gt; Can prototype 10x faster and focus on architecture&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teams:&lt;/strong&gt; Can iterate on features in hours instead of sprints&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 Real-World Performance Metrics
&lt;/h2&gt;

&lt;p&gt;From my session:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total files created&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of code written&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~2,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Build errors fixed autonomously&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Web searches performed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User "it's broken" complaints&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Times AI fixed itself&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Final build status&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Successful&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Development time:&lt;/strong&gt; 10 hours (with learning curve)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Estimated manual time:&lt;/strong&gt; 40-60 hours for equivalent quality&lt;/p&gt;




&lt;h2&gt;
  
  
  💬 The Human Element
&lt;/h2&gt;

&lt;p&gt;Despite all this automation, I was still essential. I provided:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vision&lt;/strong&gt; ("I want a MacOS desktop")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Taste&lt;/strong&gt; ("This looks dull, add real data")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain knowledge&lt;/strong&gt; ("Use Virat Kohli cricket stats")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA feedback&lt;/strong&gt; ("Photos not working")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Antigravity is a &lt;em&gt;multiplier&lt;/em&gt;, not a replacement.&lt;/strong&gt; It executes at AI speed on human-defined goals.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔮 What's Next?
&lt;/h2&gt;

&lt;p&gt;Imagine this technology in 12 months:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal understanding:&lt;/strong&gt; Show it a design mockup, get working code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codebase memory:&lt;/strong&gt; Remember every project you've built&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous testing:&lt;/strong&gt; Self-write test suites and fix failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform deployment:&lt;/strong&gt; "Make this work on iOS" → done&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're at iPhone 1 levels of maturity. The next decade will be wild.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Google Antigravity isn't magic—it's the first truly &lt;em&gt;agentic&lt;/em&gt; coding assistant. It plans, executes, debugs, and learns from feedback in a continuous loop.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;Good AI assistance isn't about eliminating coding; it's about eliminating the tedious 80% so you can focus on the creative 20%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After this experience, I can't imagine going back to traditional development. Not because I've forgotten how to code, but because I've remembered why I started coding in the first place: &lt;strong&gt;to build things&lt;/strong&gt;, not to fiddle with configs and syntax.&lt;/p&gt;

&lt;p&gt;The future of programming isn't code-free. It's &lt;em&gt;friction-free&lt;/em&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you used AI coding assistants? What's been your experience with autonomous agents vs. autocomplete? Share your thoughts in the comments!&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude&lt;/strong&gt;: The reasoning model powering Antigravity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framer Motion&lt;/strong&gt;: React animation library used in examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS&lt;/strong&gt;: Utility-first CSS framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vite&lt;/strong&gt;: Build tool for the demo project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time web search&lt;/strong&gt;: Powered by Google Search API&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Official Google Antigravity Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.analyticsvidhya.com/blog/2025/11/google-antigravity/" rel="noopener noreferrer"&gt;Google Antigravity Overview&lt;/a&gt; - Comprehensive introduction and use cases&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://antigravity.google/download" rel="noopener noreferrer"&gt;Download Google Antigravity&lt;/a&gt; - Official download page&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://codelabs.developers.google.com/getting-started-google-antigravity#8" rel="noopener noreferrer"&gt;Getting Started with Google Antigravity&lt;/a&gt; - Official Google Codelabs tutorial&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Suraj Khaitan&lt;/strong&gt; — Gen AI Architect | Building the next generation of AI-powered development tools&lt;/p&gt;

&lt;p&gt;Connect on &lt;a href="https://www.linkedin.com/in/suraj-khaitan-501736a2/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | Follow for more AI and software engineering insights&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #AI #GoogleAntigravity #SoftwareEngineering #React #WebDevelopment #AIAssistant #ClaudeAI #DeveloperTools #Programming #MachineLearning&lt;/p&gt;

</description>
      <category>antigravity</category>
      <category>python</category>
      <category>google</category>
      <category>agents</category>
    </item>
    <item>
      <title>Retrieval-Augmented Generation (RAG) Agents: How to Build Grounded, Tool‑Using GenAI Systems</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sun, 28 Dec 2025 09:41:41 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/retrieval-augmented-generation-rag-agents-how-to-build-grounded-tool-using-genai-systems-fhf</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/retrieval-augmented-generation-rag-agents-how-to-build-grounded-tool-using-genai-systems-fhf</guid>
      <description>&lt;p&gt;If you’ve built a demo where an LLM answers questions over your docs, you’ve built &lt;strong&gt;RAG&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’ve tried to ship it—and suddenly you’re dealing with missing citations, prompt injection, inconsistent tool calls, and “why did it say that?”—you’re building a &lt;strong&gt;RAG agent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This post is a practical blueprint for designing a &lt;strong&gt;GenAI RAG agent&lt;/strong&gt; that is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;grounded in evidence (with citations),&lt;/li&gt;
&lt;li&gt;capable of multi-step work (tools + loops),&lt;/li&gt;
&lt;li&gt;safe (guardrails + authorization),&lt;/li&gt;
&lt;li&gt;observable (traces + evals),&lt;/li&gt;
&lt;li&gt;and maintainable (clear contracts, not prompt spaghetti).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything here is generic and vendor-agnostic. The code snippets are intentionally simplified patterns inspired by production agent wrappers (tool calling, memory summaries, guardrail checks), without any client/project identifiers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;RAG vs. RAG agents&lt;/li&gt;
&lt;li&gt;A reference architecture you can ship&lt;/li&gt;
&lt;li&gt;Retrieval that actually works&lt;/li&gt;
&lt;li&gt;Context engineering (the underrated part)&lt;/li&gt;
&lt;li&gt;Tool use: the difference between “agent” and “chatbot”&lt;/li&gt;
&lt;li&gt;Memory: short-term chat vs. long-term summaries&lt;/li&gt;
&lt;li&gt;Guardrails: prompt injection, data leaks, and safe tool calls&lt;/li&gt;
&lt;li&gt;Verification: how you earn user trust&lt;/li&gt;
&lt;li&gt;Evaluation + observability&lt;/li&gt;
&lt;li&gt;A shipping checklist&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  RAG vs. RAG agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;RAG (single-shot)&lt;/strong&gt; is typically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;take a question,&lt;/li&gt;
&lt;li&gt;retrieve relevant passages,&lt;/li&gt;
&lt;li&gt;generate an answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;A RAG agent&lt;/strong&gt; is a system that can iterate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;understand the goal,&lt;/li&gt;
&lt;li&gt;decide next steps,&lt;/li&gt;
&lt;li&gt;retrieve evidence (possibly multiple times),&lt;/li&gt;
&lt;li&gt;call tools (search, ticketing, DB lookups, workflows),&lt;/li&gt;
&lt;li&gt;verify results,&lt;/li&gt;
&lt;li&gt;respond with citations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simple example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: “Summarize the refund policy and open a support ticket if I’m eligible.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A RAG agent might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve the policy pages,&lt;/li&gt;
&lt;li&gt;determine eligibility criteria,&lt;/li&gt;
&lt;li&gt;ask a follow-up (purchase date),&lt;/li&gt;
&lt;li&gt;call a tool to create a ticket,&lt;/li&gt;
&lt;li&gt;and return a final answer with citations + the ticket ID.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where architecture matters: once your system can &lt;strong&gt;act&lt;/strong&gt;, you need stronger controls than a prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  A reference architecture you can ship
&lt;/h2&gt;

&lt;p&gt;Here’s a diagram-friendly mental model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  ↓
Orchestrator (routing + policy)
  ├─ Retriever (vector / keyword / hybrid)
  ├─ Reranker (optional)
  ├─ Context Builder (dedupe, trim, cite)
  ├─ LLM Reasoner (constrained)
  ├─ Tool Runner (allowlist + authz)
  ├─ Memory (session + long-term summary)
  ├─ Guardrails (input/output moderation + injection defenses)
  └─ Observability (traces, logs, evals)
  ↓
Answer + Citations + Actions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key move is to treat RAG agents as &lt;strong&gt;systems&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval is a component (not magic).&lt;/li&gt;
&lt;li&gt;Tool execution is a component (not “LLM will behave”).&lt;/li&gt;
&lt;li&gt;Memory is a component (not just “add the chat history”).&lt;/li&gt;
&lt;li&gt;Verification is a component (not “hope the model is careful”).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Putting it together: an end-to-end request handler
&lt;/h3&gt;

&lt;p&gt;Below is a simplified “agent wrapper” flow you can adapt. It mirrors how production systems typically work: apply guardrails, hydrate memory, initialize a tool session, run the agent loop, persist summaries, and return a structured response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentRequest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AgentResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# 1) Establish session
&lt;/span&gt;    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;new_session_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# 2) Apply INPUT guardrails (block early if needed)
&lt;/span&gt;    &lt;span class="n"&gt;filtered_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gr_in&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;apply_guardrails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;guardrails_client&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INPUT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gr_in&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intervened&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;AgentResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your request can’t be processed due to safety policies.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gr_in&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3) Initialize tool session (for tool servers that require it)
&lt;/span&gt;    &lt;span class="n"&gt;tool_session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize_tool_session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# 4) Hydrate long-term memory summary (keep it compact)
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_agent_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;filtered_message&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Agent memory (summary): &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;

    &lt;span class="c1"&gt;# 5) Retrieve evidence and run the agent loop (tight budgets)
&lt;/span&gt;    &lt;span class="n"&gt;loop_budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loop_budget&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rewrite_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filtered_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;reasoner_llm&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;next_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filtered_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;allowed_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;tool_allowlist&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt;
            &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;validate_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="n"&gt;filtered_message&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Tool result: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;safe_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 6) Persist updated memory summary (async is fine)
&lt;/span&gt;    &lt;span class="n"&gt;new_summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_for_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filtered_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;write_agent_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;iso_now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c1"&gt;# 7) Apply OUTPUT guardrails (don’t leak sensitive data)
&lt;/span&gt;    &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gr_out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;apply_guardrails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;guardrails_client&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OUTPUT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;AgentResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gr_in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gr_out&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The big takeaway: &lt;strong&gt;agent behavior should be constrained by system code&lt;/strong&gt; (budgets, allowlists, authz), not by “hoping the prompt is strong enough.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Retrieval that actually works
&lt;/h2&gt;

&lt;p&gt;Most RAG failures are retrieval failures wearing an LLM costume.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Prefer hybrid retrieval
&lt;/h3&gt;

&lt;p&gt;Vector search is great for semantic similarity, but it misses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact identifiers,&lt;/li&gt;
&lt;li&gt;error codes,&lt;/li&gt;
&lt;li&gt;product/version strings,&lt;/li&gt;
&lt;li&gt;proper nouns,&lt;/li&gt;
&lt;li&gt;and “must match” phrases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A reliable baseline is &lt;strong&gt;hybrid retrieval&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keyword/BM25 for exactness,&lt;/li&gt;
&lt;li&gt;vectors for semantics,&lt;/li&gt;
&lt;li&gt;metadata filters for correctness.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2) Use metadata filters early
&lt;/h3&gt;

&lt;p&gt;Even perfect embeddings won’t save you if you retrieve the wrong edition.&lt;/p&gt;

&lt;p&gt;Filter by things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product/version,&lt;/li&gt;
&lt;li&gt;region/locale,&lt;/li&gt;
&lt;li&gt;document type,&lt;/li&gt;
&lt;li&gt;effective date,&lt;/li&gt;
&lt;li&gt;access control labels.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3) Query rewriting is not optional
&lt;/h3&gt;

&lt;p&gt;A user question is not always a good search query.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user: “Can I expense travel?”&lt;/li&gt;
&lt;li&gt;better retrieval query: “travel expense policy eligible expenses exceptions receipts approval limit”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In production, you typically want the agent to create a &lt;strong&gt;search query&lt;/strong&gt; (or several) and then retrieve.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Rerank if top‑k is noisy
&lt;/h3&gt;

&lt;p&gt;If you retrieve 20 passages and 12 are “kinda related,” you’ll see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;diluted context,&lt;/li&gt;
&lt;li&gt;token blowups,&lt;/li&gt;
&lt;li&gt;worse answers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A small reranker step can dramatically improve precision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Context engineering (the underrated part)
&lt;/h2&gt;

&lt;p&gt;The retrieval step isn’t finished when you get a list of chunks.&lt;/p&gt;

&lt;p&gt;Your context builder should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deduplicate near-identical chunks,&lt;/li&gt;
&lt;li&gt;keep section titles + timestamps,&lt;/li&gt;
&lt;li&gt;extract only the relevant span (not the entire page),&lt;/li&gt;
&lt;li&gt;preserve stable source IDs for citations,&lt;/li&gt;
&lt;li&gt;and respect a strict token budget.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A practical recipe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve &lt;code&gt;k=20&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;rerank to &lt;code&gt;top=6–8&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;extract salient spans (quotes)&lt;/li&gt;
&lt;li&gt;build context with citations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A citation-friendly context format
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Source: doc-17 | “Refund Policy” | Section: Eligibility | Updated: 2025-01-10]
"Refunds are available within 30 days if …"

[Source: doc-23 | “Exceptions” | Section: Digital goods | Updated: 2024-11-02]
"Digital purchases are non-refundable unless …"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it easy to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cite sources in the final answer,&lt;/li&gt;
&lt;li&gt;enforce “no citation → no claim,”&lt;/li&gt;
&lt;li&gt;and debug retrieval issues.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tool use: the difference between “agent” and “chatbot”
&lt;/h2&gt;

&lt;p&gt;Tool use is where a lot of “agents” go sideways in production.&lt;/p&gt;

&lt;p&gt;The safe pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the model proposes a tool call,&lt;/li&gt;
&lt;li&gt;your system validates it (allowlist + schema + authz),&lt;/li&gt;
&lt;li&gt;your system executes it,&lt;/li&gt;
&lt;li&gt;the model receives the result,&lt;/li&gt;
&lt;li&gt;the agent decides next steps.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  A generic tool-call client (JSON‑RPC style)
&lt;/h3&gt;

&lt;p&gt;This snippet shows a minimal pattern for a tool server with session headers and timeouts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="n"&gt;TOOL_SERVER_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL_SERVER_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool-Protocol-Version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool-Session-Id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;

    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jsonrpc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TOOL_SERVER_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize_tool_session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_tool_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;initialize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool-Session-Id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_tool_server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools/call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not “agent logic”—it’s infrastructure. Keep it boring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool allowlists and schemas
&lt;/h3&gt;

&lt;p&gt;Before executing a tool call, validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool name is in an allowlist,&lt;/li&gt;
&lt;li&gt;arguments conform to a schema,&lt;/li&gt;
&lt;li&gt;the user is authorized for the action,&lt;/li&gt;
&lt;li&gt;budgets (max calls / max latency) aren’t exceeded.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That validation should happen &lt;strong&gt;outside&lt;/strong&gt; the model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory: short-term chat vs. long-term summaries
&lt;/h2&gt;

&lt;p&gt;A common mistake is to keep appending the full conversation forever.&lt;/p&gt;

&lt;p&gt;That creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token bloat,&lt;/li&gt;
&lt;li&gt;privacy risk,&lt;/li&gt;
&lt;li&gt;and “the model latched onto something from 40 turns ago.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A more robust approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;short-term memory&lt;/strong&gt;: last N turns (recent, high-fidelity)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;long-term memory&lt;/strong&gt;: a periodically updated &lt;em&gt;summary&lt;/em&gt; (compact, durable)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A generic long-term summary write/read pattern
&lt;/h3&gt;

&lt;p&gt;The snippet below demonstrates a safe pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;store a summary keyed by &lt;code&gt;user_id&lt;/code&gt; + &lt;code&gt;session_id&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;update it after each response,&lt;/li&gt;
&lt;li&gt;read it at session start to prime the agent.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentSummaryRecord&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;KeyValueStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_agent_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KeyValueStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSummaryRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nf"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_agent_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KeyValueStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What belongs in the summary?
&lt;/h3&gt;

&lt;p&gt;A good long-term summary is &lt;em&gt;not&lt;/em&gt; a transcript. It’s:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user preferences (explicit),&lt;/li&gt;
&lt;li&gt;stable facts the user confirmed,&lt;/li&gt;
&lt;li&gt;open tasks,&lt;/li&gt;
&lt;li&gt;and important constraints.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid storing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;secrets,&lt;/li&gt;
&lt;li&gt;raw documents,&lt;/li&gt;
&lt;li&gt;PII that doesn’t need to persist.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Guardrails: prompt injection, data leaks, and safe tool calls
&lt;/h2&gt;

&lt;p&gt;If your agent reads documents from the outside world (PDFs, web pages, tickets), assume those documents can contain &lt;strong&gt;hostile instructions&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Treat retrieved content as untrusted input
&lt;/h3&gt;

&lt;p&gt;A simple, effective policy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieved text may contain facts,&lt;/li&gt;
&lt;li&gt;but it may not issue instructions,&lt;/li&gt;
&lt;li&gt;and it may not override system rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Apply input/output guardrails as a service
&lt;/h3&gt;

&lt;p&gt;Many orgs implement “guardrails” as a separate layer that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;screens user inputs,&lt;/li&gt;
&lt;li&gt;screens model outputs,&lt;/li&gt;
&lt;li&gt;optionally redacts/blocks content,&lt;/li&gt;
&lt;li&gt;returns structured metadata (“intervened”, category, severity).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a generic wrapper pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GuardrailsClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;source is typically &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;INPUT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OUTPUT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nb"&gt;NotImplementedError&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply_guardrails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;guardrails&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GuardrailsClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="n"&gt;is_structured&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_structured&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;guardrails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generic interpretation of a guardrails response
&lt;/span&gt;    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NONE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filtered_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;intervened&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOCK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INTERVENED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;intervened&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intervened&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_structured&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two practical tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If guardrails intervene, return a &lt;strong&gt;safe, deterministic&lt;/strong&gt; response (don’t ask the LLM to “explain the policy violation”).&lt;/li&gt;
&lt;li&gt;Run guardrails on &lt;strong&gt;tool outputs&lt;/strong&gt; too if they can contain sensitive data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tool safety is guardrails + authorization
&lt;/h3&gt;

&lt;p&gt;Guardrails can help with content risk, but tool safety requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;server-side authorization,&lt;/li&gt;
&lt;li&gt;immutable audit logs,&lt;/li&gt;
&lt;li&gt;strict budgets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Never rely on the model to “do the right thing.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Verification: how you earn user trust
&lt;/h2&gt;

&lt;p&gt;RAG agents gain adoption when users can &lt;em&gt;verify&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enforce “no citation → no claim”
&lt;/h3&gt;

&lt;p&gt;A strong system rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the agent can’t cite a source for a statement, it must label it as uncertainty or ask a follow-up.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quote-first answering
&lt;/h3&gt;

&lt;p&gt;A practical approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;extract supporting quotes from retrieved sources,&lt;/li&gt;
&lt;li&gt;write the answer in your own words,&lt;/li&gt;
&lt;li&gt;attach citations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This reduces hallucinations because the model is anchored to evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured outputs for actions
&lt;/h3&gt;

&lt;p&gt;When tools are involved, do not bury results inside prose.&lt;/p&gt;

&lt;p&gt;Use an explicit response contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"citations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"source_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"doc-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Refund Policy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"section"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Eligibility"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"quote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create_ticket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ticket_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INC-456"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"follow_ups"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"What was the purchase date?"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That contract makes downstream UX and testing much easier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evaluation + observability
&lt;/h2&gt;

&lt;p&gt;If you can’t measure it, you’ll end up debating prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to log (minimum viable traces)
&lt;/h3&gt;

&lt;p&gt;For each request, capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rewritten search query (or queries),&lt;/li&gt;
&lt;li&gt;retrieval results (source IDs + scores),&lt;/li&gt;
&lt;li&gt;reranking results,&lt;/li&gt;
&lt;li&gt;final context length,&lt;/li&gt;
&lt;li&gt;tool calls (name + args hash + latency + status),&lt;/li&gt;
&lt;li&gt;guardrails action metadata,&lt;/li&gt;
&lt;li&gt;citations returned.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how you answer: “Why did it say that?”&lt;/p&gt;

&lt;h3&gt;
  
  
  What to measure (starter metrics)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Citation coverage&lt;/strong&gt;: % of answers with ≥1 citation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Groundedness&lt;/strong&gt;: evaluator score or “supported claims ratio”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval precision&lt;/strong&gt;: are top citations actually relevant?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Escalation rate&lt;/strong&gt;: how often the agent says “I don’t know” or hands off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool failure rate&lt;/strong&gt;: how often tool calls fail/time out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency&lt;/strong&gt;: p50/p95 end-to-end and retrieval/tool breakdown&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Offline evaluation set
&lt;/h3&gt;

&lt;p&gt;Build a small eval dataset (even 50–200 questions) with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;expected source documents,&lt;/li&gt;
&lt;li&gt;disallowed sources,&lt;/li&gt;
&lt;li&gt;expected follow-up questions,&lt;/li&gt;
&lt;li&gt;red-team prompts for injection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Iterate retrieval first, then prompting.&lt;/p&gt;




&lt;h2&gt;
  
  
  A shipping checklist
&lt;/h2&gt;

&lt;p&gt;If you want a pragmatic sequence that reduces risk:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ship RAG with citations&lt;/strong&gt; (even if answers are short)&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;hybrid retrieval + metadata filtering&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;reranking&lt;/strong&gt; if top‑k is noisy&lt;/li&gt;
&lt;li&gt;Add a &lt;strong&gt;context builder&lt;/strong&gt; (dedupe + span extraction)&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;guardrails&lt;/strong&gt; (input + output)&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;tool runner&lt;/strong&gt; (allowlist + schema + authz)&lt;/li&gt;
&lt;li&gt;Add a &lt;strong&gt;tight agent loop&lt;/strong&gt; (max 2–3 iterations)&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;verification&lt;/strong&gt; (no citation → no claim)&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;tracing + offline evals&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This order helps you avoid “agent chaos” before your foundations are stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;A RAG agent is best thought of as a &lt;em&gt;retrieval system with an LLM interface&lt;/em&gt;—not the other way around.&lt;/p&gt;

&lt;p&gt;If you invest in retrieval quality, context building, tool safety, and verification, you get a system users trust.&lt;/p&gt;

&lt;p&gt;If you skip those and jump straight to “agent prompts,” you get a system that demos well and pages you at 2am.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the Author&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;Written by Suraj Khaitan&lt;/p&gt;

&lt;h2&gt;
  
  
  — Gen AI Architect | Working on serverless AI &amp;amp; cloud platforms.
&lt;/h2&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>rag</category>
      <category>azure</category>
    </item>
    <item>
      <title>I Built 100+ Gen AI Agents: Architecture, Patterns, and Code You Can Reuse</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sat, 06 Dec 2025 14:47:37 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/i-built-100-gen-ai-agents-architecture-patterns-and-code-you-can-reuse-1o8c</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/i-built-100-gen-ai-agents-architecture-patterns-and-code-you-can-reuse-1o8c</guid>
      <description>&lt;p&gt;If you’ve tinkered with Gen AI agents, you know the gap between cool demos and dependable systems is big. This article distills what actually works when going from a single-script agent to production-ready, multi-agent pipelines. It’s based on reusable patterns from typical agent service modules and use-case templates, adapted into generic snippets you can copy into your own stack.&lt;/p&gt;

&lt;p&gt;Note: This is a vendor-agnostic, client-agnostic write-up. No company-specific details. All code is illustrative and production-friendly.&lt;/p&gt;

&lt;p&gt;Why Agentic Architectures Matter&lt;br&gt;
Autonomy without chaos: Agents plan, act, and reflect, but need guardrails.&lt;br&gt;
Tool use is essential: Real utility comes from reliable integration with data, APIs, storage, and retrieval.&lt;br&gt;
Memory and context: Short-term scratchpads plus durable episode/task memory improve success rates.&lt;br&gt;
Orchestration beats monoliths: Separate concerns (planning, execution, observation, correction).&lt;br&gt;
A Minimal Agent: Plan–Act–Observe–Reflect&lt;br&gt;
This skeleton shows a single agent loop that plans, executes tools, observes results, and reflects to update its strategy.&lt;/p&gt;

&lt;p&gt;from typing import Callable, Dict, Any, List&lt;/p&gt;

&lt;p&gt;class Tool:&lt;br&gt;
    def &lt;strong&gt;init&lt;/strong&gt;(self, name: str, runner: Callable[[Dict[str, Any]], Dict[str, Any]]):&lt;br&gt;
        self.name = name&lt;br&gt;
        self.run = runner&lt;/p&gt;

&lt;p&gt;class Memory:&lt;br&gt;
    def &lt;strong&gt;init&lt;/strong&gt;(self):&lt;br&gt;
        self.events: List[Dict[str, Any]] = []&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def add(self, event: Dict[str, Any]):
    self.events.append(event)

def last(self, n: int = 5) -&amp;gt; List[Dict[str, Any]]:
    return self.events[-n:]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;class Agent:&lt;br&gt;
    def &lt;strong&gt;init&lt;/strong&gt;(self, planner: Callable[[str, List[Dict[str, Any]]], Dict[str, Any]],&lt;br&gt;
                 reflector: Callable[[List[Dict[str, Any]]], str],&lt;br&gt;
                 tools: Dict[str, Tool], memory: Memory):&lt;br&gt;
        self.planner = planner&lt;br&gt;
        self.reflector = reflector&lt;br&gt;
        self.tools = tools&lt;br&gt;
        self.memory = memory&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def step(self, goal: str) -&amp;gt; Dict[str, Any]:
    plan = self.planner(goal, self.memory.last())
    tool_name = plan.get("tool")
    args = plan.get("args", {})
    result = self.tools.get(tool_name, Tool("noop", lambda _: {"error": "Unknown tool", "done": False})).run(args)
    event = {"goal": goal, "plan": plan, "result": result}
    self.memory.add(event)
    feedback = self.reflector(self.memory.last())
    return {"event": event, "feedback": feedback}

def run(self, goal: str, max_steps: int = 5) -&amp;gt; List[Dict[str, Any]]:
    trace = []
    for _ in range(max_steps):
        trace.append(self.step(goal))
        if trace[-1]["event"]["result"].get("done"):
            break
    return trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Key idea: keep the loop simple and pure. Inject model/planner/reflector functions rather than hard-coding vendor calls.&lt;/p&gt;

&lt;p&gt;Tools: Keep Interfaces Consistent&lt;br&gt;
from typing import Dict, Any&lt;/p&gt;

&lt;p&gt;def search_tool(params: Dict[str, Any]) -&amp;gt; Dict[str, Any]:&lt;br&gt;
    query = params.get("query", "")&lt;br&gt;
    # Replace with your search implementation (API, vector DB, etc.)&lt;br&gt;
    return {"items": [f"Result for: {query}"], "done": False}&lt;/p&gt;

&lt;p&gt;def write_file_tool(params: Dict[str, Any]) -&amp;gt; Dict[str, Any]:&lt;br&gt;
    path = params.get("path")&lt;br&gt;
    content = params.get("content", "")&lt;br&gt;
    if not path:&lt;br&gt;
        return {"error": "Missing path", "done": False}&lt;br&gt;
    try:&lt;br&gt;
        with open(path, "w", encoding="utf-8") as f:&lt;br&gt;
            f.write(content)&lt;br&gt;
        return {"ok": True, "done": True}&lt;br&gt;
    except Exception as e:&lt;br&gt;
        return {"error": str(e), "done": False}&lt;br&gt;
Plugging in LLMs for Planning and Reflection&lt;br&gt;
Use any LLM provider. The important part is contract shape.&lt;/p&gt;

&lt;p&gt;from typing import List, Dict, Any&lt;/p&gt;

&lt;p&gt;def planner_llm(goal: str, recent_events: List[Dict[str, Any]]) -&amp;gt; Dict[str, Any]:&lt;br&gt;
    # Prompt craft is omitted; produce tool + args plan&lt;br&gt;
    # Simple heuristic plan (replace with LLM call)&lt;br&gt;
    if "write" in goal.lower():&lt;br&gt;
        return {"tool": "write_file", "args": {"path": "output.txt", "content": goal}}&lt;br&gt;
    return {"tool": "search", "args": {"query": goal}}&lt;/p&gt;

&lt;p&gt;def reflector_llm(recent_events: List[Dict[str, Any]]) -&amp;gt; str:&lt;br&gt;
    # Summarize last results and propose improvements&lt;br&gt;
    return f"Reflect: {len(recent_events)} events processed. Consider narrowing the query or validating outputs."&lt;br&gt;
Wire It Up&lt;br&gt;
from agent_loop import Agent, Memory, Tool&lt;br&gt;
from tools import search_tool, write_file_tool&lt;br&gt;
from llm_adapters import planner_llm, reflector_llm&lt;/p&gt;

&lt;p&gt;def build_agent() -&amp;gt; Agent:&lt;br&gt;
    tools = {&lt;br&gt;
        "search": Tool("search", search_tool),&lt;br&gt;
        "write_file": Tool("write_file", write_file_tool),&lt;br&gt;
    }&lt;br&gt;
    memory = Memory()&lt;br&gt;
    return Agent(planner_llm, reflector_llm, tools, memory)&lt;/p&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    agent = build_agent()&lt;br&gt;
    trace = agent.run("Write a short note about agent patterns", max_steps=3)&lt;br&gt;
    for t in trace:&lt;br&gt;
        print(t)&lt;br&gt;
Multi-Agent Pattern: Coordinator + Specialists&lt;br&gt;
When tasks are complex, split into roles: Planner, Researcher, Implementer, Reviewer. The coordinator decomposes, routes, and reconciles.&lt;/p&gt;

&lt;p&gt;from typing import Dict, Any, List&lt;br&gt;
from agent_loop import Agent, Memory&lt;/p&gt;

&lt;p&gt;class Coordinator:&lt;br&gt;
    def &lt;strong&gt;init&lt;/strong&gt;(self, agents: Dict[str, Agent]):&lt;br&gt;
        self.agents = agents&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def run(self, goal: str) -&amp;gt; List[Dict[str, Any]]:
    # naive decomposition; replace with LLM planner
    subtasks = [
        {"role": "researcher", "goal": f"Find info: {goal}"},
        {"role": "implementer", "goal": f"Draft output for: {goal}"},
        {"role": "reviewer", "goal": f"Check draft for: {goal}"},
    ]
    trace = []
    for st in subtasks:
        agent = self.agents.get(st["role"]) or self.agents.get("implementer")
        trace.append(agent.run(st["goal"], max_steps=2))
    return trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def make_specialist(planner, reflector, tools) -&amp;gt; Agent:&lt;br&gt;
    return Agent(planner, reflector, tools, Memory())&lt;br&gt;
Use Case Templates&lt;br&gt;
Your codebase’s templates often include:&lt;/p&gt;

&lt;p&gt;RAG agents: Retrieval-Augmented Generation using chunking, embeddings, and retrievers.&lt;br&gt;
ReAct agents: Emphasizing step-by-step reasoning and tool use.&lt;br&gt;
Text extraction agents: Focused on parsing documents and transforming unstructured data.&lt;br&gt;
Example: a generic RAG tool for an agent.&lt;/p&gt;

&lt;p&gt;from typing import Dict, Any, List&lt;/p&gt;

&lt;p&gt;def rag_query(params: Dict[str, Any]) -&amp;gt; Dict[str, Any]:&lt;br&gt;
    question = params.get("question", "")&lt;br&gt;
    # Plug in your embedder, vector store, and reader components&lt;br&gt;
    # docs = retriever.search(question)&lt;br&gt;
    # answer = reader.synthesize(question, docs)&lt;br&gt;
    docs: List[str] = ["Doc A", "Doc B"]&lt;br&gt;
    answer = f"Answer synthesized for: {question} using {len(docs)} docs"&lt;br&gt;
    return {"answer": answer, "sources": docs, "done": True}&lt;br&gt;
Then mount it as a tool:&lt;/p&gt;

&lt;p&gt;from agent_loop import Agent, Memory, Tool&lt;br&gt;
from llm_adapters import planner_llm, reflector_llm&lt;br&gt;
from simple_rag_tool import rag_query&lt;/p&gt;

&lt;p&gt;tools = {"rag": Tool("rag", rag_query)}&lt;br&gt;
agent = Agent(planner_llm, reflector_llm, tools, Memory())&lt;br&gt;
trace = agent.run("What is agentic RAG?", max_steps=1)&lt;br&gt;
Guardrails and Safety&lt;br&gt;
Input validation on tools (types, ranges, allowlists).&lt;br&gt;
Sandboxed execution for file/network operations.&lt;br&gt;
Rate limiting and circuit breakers for external APIs.&lt;br&gt;
Observability: structured logs and traces per agent step.&lt;br&gt;
Testing Strategy&lt;br&gt;
Test agents like workflows:&lt;/p&gt;

&lt;p&gt;Unit-test tools with deterministic inputs/outputs.&lt;br&gt;
Mock LLM planners/reflectors to stabilize tests.&lt;br&gt;
Scenario tests for end-to-end goals (success criteria + timeouts).&lt;br&gt;
from agent_loop import Agent, Memory, Tool&lt;/p&gt;

&lt;p&gt;def planner_stub(goal, _):&lt;br&gt;
    return {"tool": "echo", "args": {"text": goal}}&lt;/p&gt;

&lt;p&gt;def reflector_stub(_):&lt;br&gt;
    return "reflect"&lt;/p&gt;

&lt;p&gt;def echo_tool(params):&lt;br&gt;
    return {"echo": params.get("text", ""), "done": True}&lt;/p&gt;

&lt;p&gt;def test_agent_runs_one_step():&lt;br&gt;
    tools = {"echo": Tool("echo", echo_tool)}&lt;br&gt;
    agent = Agent(planner_stub, reflector_stub, tools, Memory())&lt;br&gt;
    trace = agent.run("hello", max_steps=3)&lt;br&gt;
    assert trace[-1]["event"]["result"].get("done") is True&lt;br&gt;
Deployment Tips&lt;br&gt;
Package agents as stateless workers with externalized memory (DB/object store).&lt;br&gt;
Use queues for long-running tasks; record step traces for resumability.&lt;br&gt;
Keep prompts modular and versioned; migrate gradually.&lt;br&gt;
Wrap-Up&lt;br&gt;
Agentic systems shine when you architect for reliability, testability, and observability. Start with a clean loop, consistent tool interfaces, memory separation, and optional multi-agent coordination. Then plug in your LLM vendor and domain-specific tools. Template folders for agents (e.g., foundation, RAG, text extraction) are a solid foundation you can adapt.&lt;/p&gt;

&lt;p&gt;10 Open-Source Agent Projects to Explore&lt;br&gt;
Here are widely used, open-source agent frameworks and projects you can learn from and adapt. Each highlights different patterns: planning, tool use, multi-agent collaboration, memory, and orchestration.&lt;/p&gt;

&lt;p&gt;Auto-GPT — Autonomous task-driven agent built on GPT models; showcases long-horizon planning and tool use. Link: &lt;a href="https://github.com/Significant-Gravitas/AutoGPT" rel="noopener noreferrer"&gt;https://github.com/Significant-Gravitas/AutoGPT&lt;/a&gt;&lt;br&gt;
BabyAGI — Lightweight task management loop (create, prioritize, execute) with vector memory; great for understanding minimal agent cycles. Link: &lt;a href="https://github.com/yoheinakajima/babyagi" rel="noopener noreferrer"&gt;https://github.com/yoheinakajima/babyagi&lt;/a&gt;&lt;br&gt;
Microsoft AutoGen — Framework for multi-agent conversations and collaboration with tooling and customization; strong for role-based agent teams. Link: &lt;a href="https://github.com/microsoft/autogen" rel="noopener noreferrer"&gt;https://github.com/microsoft/autogen&lt;/a&gt;&lt;br&gt;
CrewAI — Python framework for multi-agent workflows with roles, tools, and processes; emphasizes structured collaboration. Link: &lt;a href="https://github.com/joaomdmoura/crewai" rel="noopener noreferrer"&gt;https://github.com/joaomdmoura/crewai&lt;/a&gt;&lt;br&gt;
LangGraph — Graph-based orchestration for agent loops, memory, and control; ideal for building reliable, inspectable agent pipelines. Link: &lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;https://github.com/langchain-ai/langgraph&lt;/a&gt;&lt;br&gt;
LangChain Agents — Tool-using agents with planners, executors, and memory; integrates with a vast ecosystem of tools and vector DBs. Link: &lt;a href="https://python.langchain.com/docs/modules/agents" rel="noopener noreferrer"&gt;https://python.langchain.com/docs/modules/agents&lt;/a&gt;&lt;br&gt;
OpenAI Agents SDK — Defines agents with tools and resources and handles orchestration; useful for standardized tool schemas and governance. Link: &lt;a href="https://github.com/openai/openai-agents-python" rel="noopener noreferrer"&gt;https://github.com/openai/openai-agents-python&lt;/a&gt;&lt;br&gt;
CAMEL — Role-playing multi-agent framework with task decomposition and negotiation; useful for research on collaboration dynamics. Link: &lt;a href="https://github.com/camel-ai/camel" rel="noopener noreferrer"&gt;https://github.com/camel-ai/camel&lt;/a&gt;&lt;br&gt;
AgentGPT (Web) — Browser-based autonomous agent setup for quick experiments; helpful to visualize prompts and iterative action loops. Link: &lt;a href="https://github.com/reworkd/AgentGPT" rel="noopener noreferrer"&gt;https://github.com/reworkd/AgentGPT&lt;/a&gt;&lt;br&gt;
ReAct Pattern Implementations — Combines reasoning traces with tool actions; many open implementations to learn prompt design and action validation. Link: &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2210.03629&lt;/a&gt;&lt;br&gt;
Use these as references to pressure-test your design choices: planning reliability, tool APIs, memory schema, observability, and recovery strategies.&lt;/p&gt;

&lt;p&gt;About the Author&lt;br&gt;
Written by Suraj Khaitan — Gen AI Architect | Working on serverless AI &amp;amp; cloud platforms.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
      <category>agentaichallenge</category>
    </item>
    <item>
      <title>From Static Docs to Living Knowledge: Building an STS‑Aware Retrieval‑Augmented Agent Backend</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sun, 30 Nov 2025 14:14:19 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/from-static-docs-to-living-knowledge-building-an-sts-aware-retrieval-augmented-agent-backend-dng</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/from-static-docs-to-living-knowledge-building-an-sts-aware-retrieval-augmented-agent-backend-dng</guid>
      <description>&lt;p&gt;We’ve all seen impressive GenAI demos. Yet, in day‑to‑day engineering, the questions are softer but more real: How do we keep answers trustworthy? How do we respect access boundaries without slowing teams down? This article offers a practical, human‑centered path—from raw documents and images to a secure, explainable knowledge layer—powered by session‑aware authorization (STS) and a simple agent + tools pattern.&lt;/p&gt;

&lt;p&gt;Tone and structure are inspired by thoughtful architecture writing like “Architecture of AI‑Driven Systems” on Python Plain English, focusing on clarity, trade‑offs, and gentle guidance rather than hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;RAG without authorization is a liability. Enterprise data needs session‑scoped controls, revocation, and auditability.&lt;/li&gt;
&lt;li&gt;Accuracy is not enough; answers must be explainable and reproducible across versions.&lt;/li&gt;
&lt;li&gt;Multimodal inputs (PDFs, images) require consistent ingestion and normalization before indexing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Snapshot
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Knowledge Base Services: ingestion, chunking, embedding, indexing (vector + graph), retrieval, and an STS manager for authorization.&lt;/li&gt;
&lt;li&gt;Agent Services: an agent wrapper orchestrates LLMs, tools, and guardrails; file upload and history modules support UX continuity.&lt;/li&gt;
&lt;li&gt;Tool Services: domain tools (retriever, SQL, custom) invoked by agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flow: Upload → Initialize → Read/Image2Text → Chunk → Embed → Index (Vector + Graph) → Retrieve → STS Filter → Agent Compose → Response with citations.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Architecture :
&lt;/h2&gt;

&lt;p&gt;![RAG Architecture]&lt;br&gt;
![ ](&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/skbtam21zt7piw76ey6f.png" rel="noopener noreferrer"&gt;https://dev-to-uploads.s3.amazonaws.com/uploads/articles/skbtam21zt7piw76ey6f.png&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Multimodal Ingestion
&lt;/h2&gt;

&lt;p&gt;Key components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading: parse PDFs/text, normalize content, attach metadata (doc_id, page).&lt;/li&gt;
&lt;li&gt;Image‑to‑Text: extract text from images, unify format.&lt;/li&gt;
&lt;li&gt;Initialization: bootstraps pipelines, configs, and version stamps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Design tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normalize MIME and metadata early; downstream pipelines assume clean structure.&lt;/li&gt;
&lt;li&gt;Batch I/O with retries; track ingestion version to reproduce embeddings.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Smart Chunking for Better Retrieval
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Chunking: semantic and rule‑based chunkers.&lt;/li&gt;
&lt;li&gt;Keep chunks small enough for LLM context but rich in metadata (section, page, hierarchy).&lt;/li&gt;
&lt;li&gt;Add relationship edges to support graph queries (e.g., section→subsection).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Embeddings + Dual Indexes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Embeddings: choose model, normalize vectors, stamp versions.&lt;/li&gt;
&lt;li&gt;Vector Indexing: push to a vector store for semantic search.&lt;/li&gt;
&lt;li&gt;Graph Indexing: persist relationships and provenance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why two indexes?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector search finds semantically related content.&lt;/li&gt;
&lt;li&gt;Graph retrieves lineage and context (citations, related sections), improving explainability.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Retrieval Orchestration
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Vector and Graph Retrievers: specialized retrievers.&lt;/li&gt;
&lt;li&gt;Hybrid Retrieval Orchestrator: fuses results from both stores.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Try semantic (vector) for recall.&lt;/li&gt;
&lt;li&gt;Expand via graph for context and provenance.&lt;/li&gt;
&lt;li&gt;Fuse, rank, and return with metadata for STS filtering.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  STS‑Aware Authorization
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;STS Manager: resolves session→permissions, applies policies, and filters retrieval candidates.&lt;/li&gt;
&lt;li&gt;Enforce authorization before the agent composes answers; never let tools see disallowed content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session‑scoped access, policy revocation, and audit trails.&lt;/li&gt;
&lt;li&gt;Prevents prompt injection using forbidden context.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Agents + Tools: The Execution Layer
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Agent Wrapper: wires LLM prompts, tools, and guardrails; manages tool selection.&lt;/li&gt;
&lt;li&gt;Tools: retriever and SQL tools for controlled data access.&lt;/li&gt;
&lt;li&gt;Compose answers with citations sourced from graph metadata.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Execution pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent decides → Tool executes → STS filters → Agent composes → Return answer + sources.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Observability, Versioning, and Deletion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;knowledge_base_services/deletion/&lt;/code&gt;: right‑to‑be‑forgotten and data lifecycle.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agent_services/history_services/&lt;/code&gt;: conversational trace for monitoring and explainability.&lt;/li&gt;
&lt;li&gt;Index/embedding version stamps to reproduce runs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Example Flow (Generic Pseudo‑Code)
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1) Ingest + Normalize
&lt;/span&gt;&lt;span class="n"&gt;content_items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;image_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_to_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;initializer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content_items&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;image_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2) Chunk + Embed
&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;semantic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vecs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batch_embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3) Index (Vector + Graph)
&lt;/span&gt;&lt;span class="n"&gt;vector_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vecs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;link&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...)&lt;/span&gt;

&lt;span class="c1"&gt;# 4) Retrieve with STS filter
&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hybrid_retriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;authorized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5) Agent + Tools compose
&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;retriever_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sql_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;authorized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;with_citations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Gentle guidance: keep module names and interfaces simple. Start with clear, testable boundaries—ingest, chunk, embed, index, retrieve, filter, compose—and iterate. Good names reduce cognitive load and make onboarding kinder.&lt;/p&gt;
&lt;h2&gt;
  
  
  What to Showcase in the Post
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ingestion dispatch by MIME and metadata (reading, image‑to‑text, initialization).&lt;/li&gt;
&lt;li&gt;Semantic chunker attaching rich metadata (chunking).&lt;/li&gt;
&lt;li&gt;Batched embeddings + vector indexing with versioned names (embeddings, vector index).&lt;/li&gt;
&lt;li&gt;Hybrid retrieval orchestrator with fusion and fallbacks (vector retriever, graph retriever).&lt;/li&gt;
&lt;li&gt;STS filter gating results before agent sees them (STS manager).&lt;/li&gt;
&lt;li&gt;Agent tool wiring and citation composition (agent wrapper, tools).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Benchmarks and Learnings
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Track latency across stages: ingestion, embedding, indexing, retrieval, STS filtering, agent composition.&lt;/li&gt;
&lt;li&gt;Measure precision@k and citation correctness.&lt;/li&gt;
&lt;li&gt;Common pitfalls: over‑aggressive chunking, stale embeddings after content updates, authorization drift.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Quick Demo Hooks
&lt;/h2&gt;

&lt;p&gt;Consider adding a minimal script that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads a sample doc + image.&lt;/li&gt;
&lt;li&gt;Runs ingestion→chunk→embed→index.&lt;/li&gt;
&lt;li&gt;Executes a hybrid retrieval for a test query.&lt;/li&gt;
&lt;li&gt;Applies STS filter for two different sessions.&lt;/li&gt;
&lt;li&gt;Prints answer with citations and filtered item counts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optional starting point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a tiny virtual environment and run the demo&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv&lt;span class="p"&gt;;&lt;/span&gt; .&lt;span class="se"&gt;\.&lt;/span&gt;venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\A&lt;/span&gt;ctivate.ps1
python demo/sts_rag_demo.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Closing Checklist for Enterprise‑Grade RAG
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ingestion discipline with consistent metadata.&lt;/li&gt;
&lt;li&gt;Chunking strategy matched to content structure.&lt;/li&gt;
&lt;li&gt;Dual index (vector + graph) for recall + explainability.&lt;/li&gt;
&lt;li&gt;Retrieval orchestration with fusion and fallbacks.&lt;/li&gt;
&lt;li&gt;STS enforcement before agent composition.&lt;/li&gt;
&lt;li&gt;Observability: versions, histories, and deletion paths.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Written by Suraj Khaitan&lt;br&gt;
 — Gen AI Architect | Working on serverless AI &amp;amp; cloud platforms.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>rag</category>
      <category>aws</category>
      <category>python</category>
    </item>
    <item>
      <title>🚀 The Ultimate Guide to Intelligent Document Parsing: Building a Universal File Reader System</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sat, 22 Nov 2025 03:49:49 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/the-ultimate-guide-to-intelligent-document-parsing-building-a-universal-file-reader-system-4ojo</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/the-ultimate-guide-to-intelligent-document-parsing-building-a-universal-file-reader-system-4ojo</guid>
      <description>&lt;p&gt;&lt;em&gt;How to extract meaningful data from PDFs, Excel sheets, images, and 10+ file formats using smart parsing strategies&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Document Processing Dilemma
&lt;/h2&gt;

&lt;p&gt;Picture this: Your application needs to process documents. Sounds simple, right? But here's the catch—documents come in countless formats: PDFs, Word files, Excel spreadsheets, PowerPoint presentations, images, HTML, JSON, XML, and more. Each format has its own quirks, structure, and extraction challenges.&lt;/p&gt;

&lt;p&gt;Traditional solutions often involve cobbling together different libraries, writing repetitive code, and dealing with edge cases for each file type. What if there was a better way?&lt;/p&gt;

&lt;p&gt;In this article, I'll show you how to build an extensible, production-ready document parsing system that handles 10+ file formats with a unified interface, supports both native extraction and OCR processing, and scales to handle millions of documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 The Architecture: Three Key Principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Abstraction Through Base Classes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The foundation of any great parsing system is a well-designed abstraction layer. Instead of writing separate logic for each file type, we create a &lt;code&gt;BaseReader&lt;/code&gt; class that defines the common interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ABC&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Abstract base class for all document readers&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# File type categories
&lt;/span&gt;    &lt;span class="n"&gt;OCR_ONLY_EXTENSIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;NATIVE_ONLY_EXTENSIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.xls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;HYBRID_EXTENSIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.docx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.doc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pptx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.ppt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@abstractmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;IntermediateModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract content from the document&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;should_use_ocr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Determine whether to use OCR based on file type&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OCR_ONLY_EXTENSIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NATIVE_ONLY_EXTENSIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# HYBRID_EXTENSIONS
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optical_recognition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; This design pattern allows each file type to implement its own extraction logic while maintaining a consistent interface. Adding support for a new file format? Just create a new reader class that inherits from &lt;code&gt;BaseReader&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Dual Processing Modes: Native vs OCR&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Different documents require different extraction approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native Processing&lt;/strong&gt;: Direct content extraction using format-specific libraries (fast, preserves structure)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OCR Processing&lt;/strong&gt;: Optical Character Recognition for scanned documents or images (slower, handles visual content)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our system intelligently chooses the right approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_reader_for_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return the appropriate reader instance for the file type&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;PDFReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TextReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.xls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ExcelReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ImageReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.ppt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pptx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;PresentationReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.doc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.docx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WordReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unsupported file type: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. &lt;strong&gt;Standardized Output Format&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Regardless of input format, all readers output a consistent &lt;code&gt;IntermediateModel&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;intermediate_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;IntermediateModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;file_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;kb_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kb_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;  &lt;span class="c1"&gt;# Structured content dictionary
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This standardization makes downstream processing (chunking, indexing, searching) incredibly simple.&lt;/p&gt;




&lt;h2&gt;
  
  
  📄 Deep Dive: Format-Specific Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;PDF Processing: The Hybrid Approach&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;PDFs are tricky because they can contain both machine-readable text and scanned images. Our &lt;code&gt;PDFReader&lt;/code&gt; handles both scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PDFReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;IntermediateModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;use_ocr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;should_use_ocr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;extract_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;processing_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_2_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;use_ocr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_with_ocr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# AWS Textract
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_native&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;     &lt;span class="c1"&gt;# PyPDF2/pdfplumber
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;extract_images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_and_save_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# PyMuPDF
&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_intermediate_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Native extraction&lt;/strong&gt; using PyPDF2 is lightning-fast for text-based PDFs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_native&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pdf_reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PdfReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;page_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page_text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; For PDFs with embedded images or complex layouts, OCR processing with AWS Textract provides superior results, detecting tables, forms, and relationships between elements.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Excel &amp;amp; Spreadsheets: Preserving Structure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Excel files contain structured data that must be preserved. Our &lt;code&gt;ExcelReader&lt;/code&gt; uses &lt;code&gt;openpyxl&lt;/code&gt; to maintain the table structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExcelReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_process_worksheet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;worksheet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract with table structure
&lt;/span&gt;        &lt;span class="n"&gt;rows_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;worksheet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values_only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;row_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cell&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cell&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cell&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row_values&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# Skip empty rows
&lt;/span&gt;                &lt;span class="n"&gt;rows_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Format as structured text
&lt;/span&gt;        &lt;span class="n"&gt;text_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_format_table_as_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;excel_sheet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sheet_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;row_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Converting tabular data to readable text format makes it searchable while preserving the relationship between cells.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Images: OCR is Your Best Friend&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Images require OCR processing since there's no native text to extract. Our &lt;code&gt;ImageReader&lt;/code&gt; leverages AWS Textract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ImageReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_with_textract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;textract&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Call Textract to detect text
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;textract&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;detect_document_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bytes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_process_textract_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_process_textract_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract lines and words from Textract response
&lt;/span&gt;        &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BlockType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LINE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ocr_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best practice:&lt;/strong&gt; Always implement file size checks—Textract has a 10MB limit for synchronous calls. For larger files, use asynchronous processing.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Word Documents: Handle Both .docx and Legacy .doc&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Modern &lt;code&gt;.docx&lt;/code&gt; files use &lt;code&gt;python-docx&lt;/code&gt; for native extraction, while legacy &lt;code&gt;.doc&lt;/code&gt; files require OCR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WordReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;IntermediateModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;use_ocr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;should_use_ocr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;use_ocr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_with_ocr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_native&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.doc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# Legacy format not supported natively
&lt;/span&gt;                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_return_unsupported_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Enable OCR to process .doc files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The native extraction preserves document structure including paragraphs, tables, and images:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_native&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;page_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;element&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CT_P&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# Paragraph
&lt;/span&gt;            &lt;span class="n"&gt;para&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Paragraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page_index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;para&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paragraph&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CT_Tbl&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# Table
&lt;/span&gt;            &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;table_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_table_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page_index&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;table_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;page_index&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;PowerPoint: Extract Slides, Text, and Images&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Presentations combine text, images, and structured layouts. Our &lt;code&gt;PresentationReader&lt;/code&gt; handles all of it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PresentationReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_native&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;prs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Presentation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;slide_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slide&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slides&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;slide_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slide_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;slide_num&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="c1"&gt;# Extract text from all shapes
&lt;/span&gt;            &lt;span class="n"&gt;text_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;slide&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shapes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;text_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="c1"&gt;# Handle tables
&lt;/span&gt;                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;MSO_SHAPE_TYPE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;table_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_table_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;text_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;slide_content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;slide_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;slide_content&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  &lt;strong&gt;Text-Based Formats: HTML, XML, JSON, CSV&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For text-based formats, our &lt;code&gt;TextReader&lt;/code&gt; uses built-in Python libraries to minimize dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TextReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseReader&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_native&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_detect_encoding&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;raw_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;processed_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_content&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;processed_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_process_xml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;processed_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_process_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;processed_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_process_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;processed_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_process_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;processed_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;HTML extraction&lt;/strong&gt; uses a custom parser to strip tags while preserving content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HTMLTextExtractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HTMLParser&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_tag&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;script&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;style&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🎨 Image Extraction: Going Beyond Text
&lt;/h2&gt;

&lt;p&gt;Many documents contain valuable information in images. Our system can extract and save images separately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_and_save_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract images from PDF and save to S3&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;pdf_document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local_file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page_num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_document&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_document&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;image_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_images&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;img_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;xref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;base_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="c1"&gt;# Generate unique image identifier
&lt;/span&gt;            &lt;span class="n"&gt;image_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="n"&gt;image_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kb_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ext&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

            &lt;span class="c1"&gt;# Upload to S3
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Track image URL in content
&lt;/span&gt;            &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables downstream image-to-text processing or multimodal AI applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚡ Performance Optimization Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Graceful Error Handling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Never let one problematic file crash the entire batch. Keep processing resilient and report partial successes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_single_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_reader_for_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Narrow in real code
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. &lt;strong&gt;Numeric Type Normalization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Different storage layers (NoSQL, relational ORMs, JSON parsers) may return numeric wrappers. Normalize before serialization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Integers remain integers; decimals become float for JSON
&lt;/span&gt;        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;decimal&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ImportError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;deep_normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;deep_normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;deep_normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;normalize_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔧 Putting It All Together: Generic Batch Pipeline
&lt;/h2&gt;

&lt;p&gt;A framework-agnostic example that you can call from a CLI, a web job, or a worker queue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process a list of file metadata dictionaries.

    files: each dict contains at least file_id, file_name, file_type, file_url
    config: processing flags (e.g., {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optical_recognition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: True})
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deep_normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;process_single_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage:
# documents = load_pending_documents_from_store()
# outcome = process_documents(documents, {"optical_recognition": False})
# print(json.dumps(outcome, indent=2))
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🚀 Real-World Benefits
&lt;/h2&gt;

&lt;p&gt;This architecture delivers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility&lt;/strong&gt;: Add new file formats by creating a new reader class&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Switch between native and OCR processing per document&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Works across threads, workers, or serverless functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt;: Graceful error handling and status tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintainability&lt;/strong&gt;: Clean abstraction layer and consistent interfaces&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  💡 Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Building a production-ready document parsing system requires:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Strong abstraction layer&lt;/strong&gt; with base classes defining common interfaces&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Format-specific strategies&lt;/strong&gt; using the best library for each file type&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Dual processing modes&lt;/strong&gt; (native + OCR) for maximum coverage&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Standardized output&lt;/strong&gt; making downstream processing trivial&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Robust error handling&lt;/strong&gt; to prevent pipeline failures&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Performance optimization&lt;/strong&gt; through smart dependency management  &lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Consider these enhancements for your parsing system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal processing&lt;/strong&gt;: Combine text with image embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Table extraction&lt;/strong&gt;: Preserve table structure for structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layout analysis&lt;/strong&gt;: Maintain document formatting and hierarchy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental processing&lt;/strong&gt;: Handle document updates efficiently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality metrics&lt;/strong&gt;: Track extraction confidence and completeness&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Document parsing doesn't have to be painful. With the right architecture—abstraction, format-specific strategies, and intelligent processing modes—you can build a system that handles any file format thrown at it.&lt;/p&gt;

&lt;p&gt;The key is to think in terms of &lt;strong&gt;interfaces, not implementations&lt;/strong&gt;. By defining a clear contract through the &lt;code&gt;BaseReader&lt;/code&gt; class, you create a system that's both powerful and maintainable.&lt;/p&gt;

&lt;p&gt;Whether you're building a knowledge base, search engine, or AI application, these patterns will serve you well. Start with the formats you need today, and confidently add new ones tomorrow.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you built a document parsing system? What challenges did you face? Share your experiences in the comments below!&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPDF2&lt;/strong&gt;: PDF text extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;openpyxl&lt;/strong&gt;: Excel file processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;python-docx&lt;/strong&gt;: Word document handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;python-pptx&lt;/strong&gt;: PowerPoint processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Textract&lt;/strong&gt;: OCR service for images and scanned documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyMuPDF (fitz)&lt;/strong&gt;: PDF image extraction&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Written by Suraj Khaitan&lt;br&gt;
 — Gen AI Architect | Working on serverless AI &amp;amp; cloud platforms.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>python</category>
      <category>datascience</category>
      <category>programming</category>
    </item>
    <item>
      <title>Building a Production-Ready LangGraph-Style Agent: From Raw Documents to Structured Intelligence</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sat, 15 Nov 2025 09:05:11 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/building-a-production-ready-langgraph-style-agent-from-raw-documents-to-structured-intelligence-562m</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/building-a-production-ready-langgraph-style-agent-from-raw-documents-to-structured-intelligence-562m</guid>
      <description>&lt;p&gt;A pragmatic walkthrough of orchestrating extraction, summarization, memory, and routing using graph patterns and modular agent components—fully generic.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Why Another “Agent” Article?&lt;br&gt;
Most write‑ups about agents stop at toy examples. This guide focuses on practical layering: ingest unstructured content (like PDFs), extract what matters, summarize with guardrails, persist long‑term memory, and route requests through specialized workflows—while keeping everything cloud‑friendly and composable. All patterns are standalone; you do not need any specific repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conceptual Architecture (LangGraph Pattern)&lt;br&gt;
We adopt a graph mindset (inspired by LangGraph) where each node encapsulates a responsibility:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Ingestion Node: Accepts user query + optional file references.&lt;br&gt;
Extraction Node: Pulls raw text from uploaded documents (e.g., PDFs in object storage).&lt;br&gt;
Summarization Node: Produces structured or free‑form summaries (LLM with JSON schema enforcement).&lt;br&gt;
Memory Node: Persists distilled knowledge for subsequent sessions.&lt;br&gt;
Routing Node: Selects workflow type (foundation / RAG / extractor) based on config.&lt;br&gt;
Output Node: Returns assistant response + structural content for UI or downstream processes.&lt;br&gt;
Represented abstractly:&lt;/p&gt;

&lt;p&gt;User Input --&amp;gt; [Router] --&amp;gt; (Foundation | RAG | Extractor)&lt;br&gt;
                                 |        |         |&lt;br&gt;
                          [Summarizer]  [Retriever] [Text Parser]&lt;br&gt;
                                   \        |        /&lt;br&gt;
                                [Memory &amp;amp; Persist]&lt;br&gt;
                                       |&lt;br&gt;
                                    [Response]&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Core Building Blocks
3.1 Structured Summarization
The summarizer turns extracted content plus an existing structural scaffold into a validated JSON object. Generic pattern:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;def summarize_json(state: Message, llm: ChatBedrock, schema: dict) -&amp;gt; dict:&lt;br&gt;
    DynamicModel = json_schema_to_pydantic(schema)&lt;br&gt;
    structured_llm = llm.with_structured_output(DynamicModel)&lt;br&gt;
    system_context = f"CONTENT: {state.content}\nSTRUCTURED_OUTPUT: {state.structural_content}"&lt;br&gt;
    messages = [&lt;br&gt;
        SystemMessage(content=system_context),&lt;br&gt;
        HumanMessage(content="Summarize and fill the schema accurately."),&lt;br&gt;
    ]&lt;br&gt;
    result = structured_llm.invoke(messages)&lt;br&gt;
    return result.model_dump()&lt;br&gt;
Key takeaways:&lt;/p&gt;

&lt;p&gt;Use JSON Schema → dynamic Pydantic model for strong typing.&lt;br&gt;
Keep prompt minimal; the schema drives completeness.&lt;br&gt;
Separate raw content (unstructured text) from structural_content (prior structured context or template fields).&lt;br&gt;
3.2 Generic Generation (Content vs Structure)&lt;br&gt;
Switch temperature / max tokens depending on output goal:&lt;/p&gt;

&lt;p&gt;def generate(context: dict | str, prompt: str, llm: ChatBedrock, structured: bool = False):&lt;br&gt;
    ctx = json.dumps(context, indent=2) if isinstance(context, dict) else context&lt;br&gt;
    messages = [&lt;br&gt;
        SystemMessage(content=prompt),&lt;br&gt;
        HumanMessage(content=f"Structured data:\n{ctx}\n"),&lt;br&gt;
    ]&lt;br&gt;
    resp = llm.invoke(messages)&lt;br&gt;
    raw = resp.model_dump().get("content", "").strip()&lt;br&gt;
    # Try JSON first, fall back to text&lt;br&gt;
    try:&lt;br&gt;
        parsed = json.loads(raw)&lt;br&gt;
        return parsed if isinstance(parsed, dict) else raw&lt;br&gt;
    except Exception:&lt;br&gt;
        return raw&lt;br&gt;
3.3 Workflow Routing (Generic)&lt;br&gt;
The router inspects configuration (e.g., workflow_type) and dynamically dispatches:&lt;/p&gt;

&lt;p&gt;def run_agent(request: AgentMessageRequest) -&amp;gt; Message:&lt;br&gt;
    cfg = load_config()&lt;br&gt;
    workflow = cfg.get("workflow_type", "foundation")&lt;br&gt;
    if workflow == "rag":&lt;br&gt;
        return run_rag(request)  # retrieval + synthesis path&lt;br&gt;
    if workflow == "extractor":&lt;br&gt;
        return run_extractor(request)  # text parsing path&lt;br&gt;
    # Foundation path&lt;br&gt;
    crew = FoundationCrew()  # sets up Agent + Task&lt;br&gt;
    result = crew.crew.kickoff(inputs={"query": request.message})&lt;br&gt;
    return Message(&lt;br&gt;
        role="assistant",&lt;br&gt;
        structural_content={"response": str(result.raw)},&lt;br&gt;
        content=str(result.raw),&lt;br&gt;
        metadata={"workflow_type": workflow, "status": "success"},&lt;br&gt;
    )&lt;br&gt;
Routing Principles:&lt;/p&gt;

&lt;p&gt;Keep each specialized agent self‑contained.&lt;br&gt;
Avoid heavy if/else trees by mapping workflow keys to callables.&lt;br&gt;
Return a unified Message model to downstream consumers.&lt;br&gt;
3.4 Memory &amp;amp; Persistence&lt;br&gt;
Persist long‑term summaries (e.g., DynamoDB, PostgreSQL, Redis) via a memory node. Simplified pattern:&lt;/p&gt;

&lt;p&gt;def persist_summary(state: Message, user_id: str, session_id: str, table) -&amp;gt; None:&lt;br&gt;
    summary_blob = summarize_json(state, llm, schema)&lt;br&gt;
    item = {&lt;br&gt;
        "user_id": user_id,&lt;br&gt;
        "session_id": session_id,&lt;br&gt;
        "session_time": iso_now_utc(),&lt;br&gt;
        "agent_summary": json.dumps(summary_blob),&lt;br&gt;
    }&lt;br&gt;
    table.put_item(Item=item)&lt;/p&gt;

&lt;p&gt;def load_memory(user_id: str, session_id: str, table) -&amp;gt; dict | None:&lt;br&gt;
    resp = table.get_item(Key={"user_id": user_id, "session_id": session_id})&lt;br&gt;
    return resp.get("Item")&lt;br&gt;
Add caching for reads; keep writes idempotent when possible.&lt;/p&gt;

&lt;p&gt;3.5 Asynchronous Fan‑Out (Optional)&lt;br&gt;
Queue raw payloads for UI updates / analytics:&lt;/p&gt;

&lt;p&gt;def enqueue_payload(message: AgentMessageResponse, queue_url: str, sqs_client) -&amp;gt; None:&lt;br&gt;
    body = json.dumps(message.model_dump())&lt;br&gt;
    sqs_client.send_message(QueueUrl=queue_url, MessageBody=body)&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Text Extraction Use Case (Generic Flow)
An end‑to‑end invocation typically looks like this:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Deploy supporting infra (locally or remote) – object storage, agent endpoint.&lt;br&gt;
Upload PDFs to a bucket.&lt;br&gt;
POST an agent payload referencing uploaded file keys.&lt;br&gt;
Receive structured summary / overview.&lt;br&gt;
Condensed generic flow:&lt;/p&gt;

&lt;p&gt;pdf_files = discover_local_pdfs("./samples")&lt;br&gt;
for path in pdf_files:&lt;br&gt;
    s3_key = upload_pdf(path, bucket)&lt;br&gt;
    payload = {&lt;br&gt;
        "message": "Give me a concise overview.",&lt;br&gt;
        "sessionId": uuid.uuid4().hex,&lt;br&gt;
        "metadata": {"files": [s3_key]},&lt;br&gt;
    }&lt;br&gt;
    resp = requests.post(agent_url, headers=auth_headers(), json=payload)&lt;br&gt;
    print(parse_summary(resp.json()))&lt;br&gt;
Guidelines:&lt;/p&gt;

&lt;p&gt;Keep uploads batched to reduce auth overhead.&lt;br&gt;
Return both human‑readable content and machine‑friendly structural_content.&lt;br&gt;
Enforce timeouts; PDFs can be large.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Optional: A Minimal LangGraph-Style Graph Definition
If you formalize nodes with LangGraph, a simple graph assembly could look like:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;from langgraph.graph import Graph&lt;/p&gt;

&lt;p&gt;graph = Graph()&lt;br&gt;
graph.add_node("route", route_node)&lt;br&gt;
graph.add_node("extract", extract_node)&lt;br&gt;
graph.add_node("summarize", summarize_node)&lt;br&gt;
graph.add_node("memory", memory_node)&lt;br&gt;
graph.add_node("respond", respond_node)&lt;/p&gt;

&lt;p&gt;graph.add_edge("route", "extract")&lt;br&gt;
graph.add_edge("extract", "summarize")&lt;br&gt;
graph.add_edge("summarize", "memory")&lt;br&gt;
graph.add_edge("memory", "respond")&lt;/p&gt;

&lt;p&gt;app = graph.compile()&lt;br&gt;
result = app.invoke({"message": "Summarize the uploaded docs", "files": file_keys})&lt;br&gt;
print(result["content"])&lt;br&gt;
Design Notes:&lt;/p&gt;

&lt;p&gt;Each node keeps a single responsibility.&lt;br&gt;
The compiled graph enforces explicit data flow—easier to test.&lt;br&gt;
Inject observability (timers, counters) at node boundaries.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Production Hardening Checklist
Input Validation: Reject oversized or malformed files early.
Structured Output Enforcement: Fail fast if schema fields are missing.
Idempotency: Re‑summarize only when source or config changes.
Observability: Log node start/end + latency; emit metrics per workflow.
Cost Controls: Cache summaries; adjust temperature and max_tokens conservatively.
Security: Signed URLs for document retrieval; strict auth on agent endpoint.
Drift Handling: Maintain versioned JSON schemas for structured outputs.&lt;/li&gt;
&lt;li&gt;Common Pitfalls
Overstuffing the system prompt—better to pass clean content + minimal instruction.
Mixing storage concerns (session state) with transformation logic—keep memory node isolated.
Ignoring error surfaces: always wrap LLM calls and return structured error objects.&lt;/li&gt;
&lt;li&gt;Extending the Graph
Add an Evaluation Node: Auto‑grade summaries against reference heuristics.
Add a Retrieval Node: Hybrid semantic + metadata filtering before summarization.
Add a Redaction Node: Strip PII before persistence.&lt;/li&gt;
&lt;li&gt;Try It Yourself (Generic Mini Script)
def quick_demo(file_paths: list[str]):
# Pretend upload + extraction
extracted_chunks = [open(p, encoding="utf-8").read()[:4000] for p in file_paths]
merged = "\n".join(extracted_chunks)
state = Message(role="user", content=merged, structural_content={"sections": []})
llm = return_llm()  # any provider instance (Bedrock, OpenAI, local, etc.)
schema = {"type": "object", "properties": {"summary": {"type": "string"}}}
summary = summarize_json(state, llm, schema)
print("Summary:\n", summary["summary"])&lt;/li&gt;
&lt;li&gt;Conclusion
A production‑ready agent is not magic—it is a disciplined composition of small, testable nodes: routing, extraction, summarization, memory, and output shaping. By expressing the system as a graph, you gain clarity, resilience, and extensibility. Start simple (foundation workflow), then layer retrieval, structured output, and long‑term memory as concrete value drivers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Questions or improvements you want to explore next (e.g., evaluation, caching, or multi‑modal inputs)? Turn each responsibility into a node, experiment, and iterate.&lt;/p&gt;

&lt;p&gt;Happy building.&lt;/p&gt;

&lt;p&gt;About the Author&lt;br&gt;
Written by Suraj Khaitan — Gen AI Architect | Working on serverless AI &amp;amp; cloud platforms.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>aws</category>
      <category>python</category>
    </item>
    <item>
      <title>Building a Scalable Agent-to-Agent (A2A) Communication Protocol on AWS</title>
      <dc:creator>Suraj Khaitan</dc:creator>
      <pubDate>Sun, 09 Nov 2025 04:59:40 +0000</pubDate>
      <link>https://dev.to/suraj_khaitan_f893c243958/building-a-scalable-agent-to-agent-a2a-communication-protocol-on-aws-1g0i</link>
      <guid>https://dev.to/suraj_khaitan_f893c243958/building-a-scalable-agent-to-agent-a2a-communication-protocol-on-aws-1g0i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D1200" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D1200" alt="Agent-to-Agent Communication" width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the rapidly evolving landscape of AI agents and autonomous systems, enabling seamless communication between different agents has become crucial. The Agent-to-Agent (A2A) protocol provides a standardized way for AI agents to interact, exchange messages, and coordinate tasks. In this article, I'll walk you through our implementation of an A2A gateway built on AWS serverless architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Agent-to-Agent (A2A) Communication?
&lt;/h2&gt;

&lt;p&gt;Agent-to-Agent communication is a protocol that allows different AI agents to interact with each other in a standardized way. Think of it as an API contract specifically designed for agent interactions. Rather than having agents communicate through proprietary interfaces, A2A provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standardized message formats&lt;/strong&gt; using JSON-RPC&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task lifecycle management&lt;/strong&gt; (submit, track, cancel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context preservation&lt;/strong&gt; across multi-turn conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous processing&lt;/strong&gt; with polling-based status checks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure authentication&lt;/strong&gt; between agents&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Our A2A implementation leverages AWS serverless services to create a scalable, cost-effective solution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐
│   Client    │
│   Agent     │
└──────┬──────┘
       │ HTTPS + JWT
       ▼
┌─────────────────────┐
│  API Gateway        │
│  Custom Authorizer  │
└──────┬──────────────┘
       │
       ▼
┌─────────────────────┐
│  Lambda (FastAPI)   │
│  A2A Gateway        │
└──────┬──────────────┘
       │
       ├──────────┬─────────┐
       ▼          ▼         ▼
┌──────────┐ ┌──────┐ ┌──────────┐
│ DynamoDB │ │ SQS  │ │ Secrets  │
│  Tasks   │ │Queue │ │ Manager  │
└──────────┘ └──────┘ └──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Components
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway with Custom Authorizer&lt;/strong&gt;: Validates JWT tokens and enforces scope-based access control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI Lambda Function&lt;/strong&gt;: Handles JSON-RPC requests and implements the A2A protocol&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt;: Stores task state and history with efficient querying via GSI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQS&lt;/strong&gt;: Decouples message submission from agent processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;: Securely manages signing keys for inter-service communication&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Core Implementation Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. JSON-RPC Handler
&lt;/h3&gt;

&lt;p&gt;The gateway implements the A2A protocol using JSON-RPC 2.0, supporting three primary operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/a2a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A2A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;a2a_rpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;a2a_req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;A2ARequest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a2a_req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SendMessageRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message.send&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_inject_user_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on_message_send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GetTaskRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task.get&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on_get_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancelTaskRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task.cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;rpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on_cancel_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Authentication and Authorization
&lt;/h3&gt;

&lt;p&gt;Security is implemented through a multi-layered approach:&lt;/p&gt;

&lt;h4&gt;
  
  
  Custom JWT Authorizer
&lt;/h4&gt;

&lt;p&gt;The authorizer validates access tokens from an OAuth provider, checking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token signature using JWKS (JSON Web Key Set)&lt;/li&gt;
&lt;li&gt;Token expiration and validity&lt;/li&gt;
&lt;li&gt;Issuer and audience claims&lt;/li&gt;
&lt;li&gt;Required scopes for each operation
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_verify_access_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;header&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_unverified_header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;kid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;alg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;jwks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_jwks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Cached JWKS
&lt;/span&gt;    &lt;span class="n"&gt;jwk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;jwks&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;kid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jwk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;algorithms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;alg&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;audience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_AUDIENCE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;issuer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ISSUER&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;claims&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Scope-Based Access Control
&lt;/h4&gt;

&lt;p&gt;Different operations require specific scopes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Required Scope&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Send Message&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tasks:submit&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Get Task Status&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tasks:read&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cancel Task&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tasks:cancel&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Inter-Service Authentication
&lt;/h4&gt;

&lt;p&gt;For communication between the A2A gateway and backend agents, we mint short-lived JWTs (60 seconds TTL):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mint_agent_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iss&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a2a-gateway&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sub&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;algorithm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HS256&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Task Management and State Machine
&lt;/h3&gt;

&lt;p&gt;Tasks follow a clear lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;submitted → working → completed
                   ↘ failed
                   ↘ canceled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  DynamoDB Schema
&lt;/h4&gt;

&lt;p&gt;We use a single-table design with a Global Secondary Index for efficient querying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;A2ADatabase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Primary Key: Task-centric access
&lt;/span&gt;    &lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# Task#{task_id}
&lt;/span&gt;    &lt;span class="n"&gt;sk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# Event#{timestamp}#{uuid}
&lt;/span&gt;
    &lt;span class="c1"&gt;# GSI: Session-centric access
&lt;/span&gt;    &lt;span class="n"&gt;gsi1_pk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# session_id
&lt;/span&gt;    &lt;span class="n"&gt;gsi1_sk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;  &lt;span class="c1"&gt;# created_at_ms
&lt;/span&gt;
    &lt;span class="c1"&gt;# Domain fields
&lt;/span&gt;    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskState&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This schema enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast task status lookups using the primary key&lt;/li&gt;
&lt;li&gt;Chronological event history per task&lt;/li&gt;
&lt;li&gt;Session-based queries via GSI&lt;/li&gt;
&lt;li&gt;Optimistic concurrency through monotonic timestamps&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Asynchronous Processing with Polling
&lt;/h3&gt;

&lt;p&gt;Rather than implementing real-time streaming (complex in API Gateway + Lambda), we use a polling model:&lt;/p&gt;

&lt;h4&gt;
  
  
  Message Send Flow
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Client sends message&lt;/strong&gt;: Initial request creates a task in &lt;code&gt;submitted&lt;/code&gt; state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immediate response&lt;/strong&gt;: Returns task ID with &lt;code&gt;submitted&lt;/code&gt; status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background processing&lt;/strong&gt;: Message queued to SQS for agent processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client polls&lt;/strong&gt;: Periodically calls &lt;code&gt;tasks/get&lt;/code&gt; to check status (recommended: 15-second intervals)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task completion&lt;/strong&gt;: Agent updates task to &lt;code&gt;completed&lt;/code&gt; with response artifact
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_message_send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageSendParams&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Check if polling existing task
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on_get_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;TaskQueryParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create new task
&lt;/span&gt;    &lt;span class="n"&gt;task_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_session_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Store in DynamoDB
&lt;/span&gt;    &lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;submitted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_message_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;# ... other fields
&lt;/span&gt;    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Queue for processing
&lt;/span&gt;    &lt;span class="n"&gt;sqs_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SQS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;MessageBody&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message_body&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TaskState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;submitted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Task Cancellation
&lt;/h3&gt;

&lt;p&gt;Clients can cancel tasks that haven't completed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_cancel_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskIdParams&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on_get_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;TaskQueryParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TaskState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;  &lt;span class="c1"&gt;# Cannot cancel completed tasks
&lt;/span&gt;
    &lt;span class="c1"&gt;# Update state to canceled
&lt;/span&gt;    &lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;canceled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task cancelled by user.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# ... other fields
&lt;/span&gt;    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TaskState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;canceled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices and Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Monotonic Timestamps for Event Ordering
&lt;/h3&gt;

&lt;p&gt;We encountered a subtle bug where rapid successive events could have the same millisecond timestamp, causing sort key collisions. Solution: maintain a module-level counter to ensure monotonically increasing timestamps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_LAST_EVENT_MS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_next_event_ms&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_LAST_EVENT_MS&lt;/span&gt;
    &lt;span class="n"&gt;now_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;now_ms&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;_LAST_EVENT_MS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;now_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_LAST_EVENT_MS&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;_LAST_EVENT_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now_ms&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;now_ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. JWKS Caching with Fallback
&lt;/h3&gt;

&lt;p&gt;Fetching JWKS on every request is inefficient. We implemented a cache with graceful degradation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_jwks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_JWKS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;_JWKS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;CACHE_TTL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_JWKS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_fetch_jwks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;_JWKS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_JWKS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;  &lt;span class="c1"&gt;# Use stale cache
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_JWKS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Comprehensive Logging
&lt;/h3&gt;

&lt;p&gt;Structured logging is crucial for debugging distributed systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on_message_send created | task_id=%s sessionId=%s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sessionId&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Consistent Read for Status Checks
&lt;/h3&gt;

&lt;p&gt;DynamoDB's eventual consistency can cause race conditions. Always use &lt;code&gt;ConsistentRead=True&lt;/code&gt; when checking task status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;KeyConditionExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt;
    &lt;span class="n"&gt;ConsistentRead&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Critical!
&lt;/span&gt;    &lt;span class="n"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Proper Error Handling
&lt;/h3&gt;

&lt;p&gt;Implement JSON-RPC compliant error responses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JSONResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;JSONRPCErrorResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;InternalError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;  &lt;span class="c1"&gt;# JSON-RPC errors still return 200
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scalability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda concurrency&lt;/strong&gt;: Automatically scales to handle traffic spikes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB on-demand&lt;/strong&gt;: Scales with request volume without capacity planning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQS buffering&lt;/strong&gt;: Smooths traffic bursts to backend agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless design&lt;/strong&gt;: No sticky sessions or state management overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Optimization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda cold starts&lt;/strong&gt;: Mitigated by keeping functions warm during business hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB query efficiency&lt;/strong&gt;: Single-table design with targeted queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secret caching&lt;/strong&gt;: Reduces Secrets Manager API calls by 99%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JWKS caching&lt;/strong&gt;: Avoids repeated HTTPS calls to OAuth provider&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security Highlights
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Defense in depth&lt;/strong&gt;: Multiple authentication layers (OAuth, scopes, inter-service JWTs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Least privilege&lt;/strong&gt;: Scope-based access control ensures clients can only perform authorized operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short-lived tokens&lt;/strong&gt;: Agent tokens expire in 60 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTPS only&lt;/strong&gt;: All communication encrypted in transit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No sensitive data in logs&lt;/strong&gt;: User IDs and task IDs only, no message content&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Future Enhancements
&lt;/h2&gt;

&lt;p&gt;While our current implementation is production-ready, several enhancements could be valuable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;WebSocket support&lt;/strong&gt;: For real-time updates instead of polling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event-driven notifications&lt;/strong&gt;: SNS/EventBridge for push notifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GraphQL interface&lt;/strong&gt;: Alternative to JSON-RPC for more flexible queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-region deployment&lt;/strong&gt;: For global low-latency access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced observability&lt;/strong&gt;: X-Ray tracing and CloudWatch Insights dashboards&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building a robust A2A gateway on AWS requires careful consideration of security, scalability, and reliability. By leveraging serverless services and implementing best practices like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proper authentication and authorization&lt;/li&gt;
&lt;li&gt;Asynchronous processing with polling&lt;/li&gt;
&lt;li&gt;Monotonic event ordering&lt;/li&gt;
&lt;li&gt;Comprehensive error handling&lt;/li&gt;
&lt;li&gt;Strategic caching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We've created a system that can handle enterprise-scale agent interactions while maintaining security and performance.&lt;/p&gt;

&lt;p&gt;The A2A protocol represents an important step toward interoperable AI agents. As the ecosystem matures, standardized communication protocols will enable rich agent ecosystems where specialized agents can collaborate to solve complex problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;✅ Use JSON-RPC for standardized agent communication&lt;br&gt;&lt;br&gt;
✅ Implement multi-layered security with OAuth and scope-based access control&lt;br&gt;&lt;br&gt;
✅ Design for asynchronous processing with polling-based status checks&lt;br&gt;&lt;br&gt;
✅ Leverage DynamoDB single-table design for efficient task management&lt;br&gt;&lt;br&gt;
✅ Cache aggressively but with graceful fallbacks&lt;br&gt;&lt;br&gt;
✅ Use monotonic timestamps to prevent race conditions&lt;br&gt;&lt;br&gt;
✅ Log structured data for observability  &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the Implementation&lt;/strong&gt;: This A2A gateway is built with Python, FastAPI, AWS Lambda, DynamoDB, SQS, and integrates with OAuth 2.0 providers. It follows the A2A specification and provides a scalable foundation for agent-to-agent communication.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have you implemented agent communication protocols? What challenges did you face? Share your experiences in the comments below!&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Written by Suraj Khaitan&lt;br&gt;
— Gen AI Architect | Working on serverless AI &amp;amp; cloud platforms.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>python</category>
      <category>agents</category>
      <category>a2a</category>
    </item>
  </channel>
</rss>
