<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sanjay Patoliya</title>
    <description>The latest articles on DEV Community by Sanjay Patoliya (@sanjaypatoliya).</description>
    <link>https://dev.to/sanjaypatoliya</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3766795%2F6c37dd72-a20a-4354-8cf0-f0146a5d7809.jpeg</url>
      <title>DEV Community: Sanjay Patoliya</title>
      <link>https://dev.to/sanjaypatoliya</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanjaypatoliya"/>
    <language>en</language>
    <item>
      <title>Building an AI Document Processor with RAG, pgvector, and AWS</title>
      <dc:creator>Sanjay Patoliya</dc:creator>
      <pubDate>Wed, 24 Jun 2026 14:03:35 +0000</pubDate>
      <link>https://dev.to/sanjaypatoliya/building-an-ai-document-processor-with-rag-pgvector-and-aws-bik</link>
      <guid>https://dev.to/sanjaypatoliya/building-an-ai-document-processor-with-rag-pgvector-and-aws-bik</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;A full-stack AI platform where you upload any document — PDF, invoice, contract, image — and the system automatically extracts text, classifies the document type, generates a summary, and lets you chat with it in natural language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload a PDF or image — it goes directly to S3 via presigned URL&lt;/li&gt;
&lt;li&gt;Watch real-time processing status as Textract, Claude, and the embedding pipeline run — via WebSocket&lt;/li&gt;
&lt;li&gt;Read the AI-generated classification and summary&lt;/li&gt;
&lt;li&gt;Ask questions in natural language — get answers with cited page numbers&lt;/li&gt;
&lt;li&gt;Search across your entire document library in one query&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key difference from a simple chatbot: &lt;strong&gt;the answers are grounded in your documents&lt;/strong&gt;. Claude can only use the retrieved chunks, and every answer includes source citations with page numbers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjyb2zfa3hnpcr02356ul.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjyb2zfa3hnpcr02356ul.jpg" alt="AI Document Processor — login screen" width="800" height="744"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
  │
  ▼
CloudFront (HTTPS)
  ├── /* ──────────────► S3 (React static files)
  └── /api/v1/* ───────► ALB
                           │
                           ▼
                      ECS Fargate (FastAPI)
                       Cognito JWT auth
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         S3 Upload      RAG Chat    WebSocket
        (presigned)     pgvector      status
              │            │            │
              ▼            ▼            ▼
          S3 trigger   Anthropic   API Gateway
              │          Claude     WebSocket
              ▼
       Lambda Orchestrator
              │
       Amazon Textract
         (OCR + tables)
              │
       Lambda Processor
              │
      ┌───────┴────────┐
      ▼                ▼
   Claude           pgvector
(classify +        (chunks +
 summarize)       embeddings)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This project is the most complex of my three — eight CDK stacks covering networking, storage, auth, database, async processing, WebSocket, compute, and monitoring. The async processing pipeline (S3 → Lambda → Textract → Lambda → Claude → pgvector) is the architectural core, separate from the FastAPI backend that handles the user-facing API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React 18 + TypeScript + Vite + TailwindCSS + React Query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI (Python 3.12) + SQLAlchemy async + Alembic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Anthropic Claude (&lt;code&gt;claude-sonnet-4-6&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;Voyage AI (&lt;code&gt;voyage-3&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OCR&lt;/td&gt;
&lt;td&gt;Amazon Textract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector Search&lt;/td&gt;
&lt;td&gt;pgvector (Aurora PostgreSQL Serverless v2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;AWS Cognito + JWT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;td&gt;API Gateway WebSocket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;ECS Fargate + ALB + CloudFront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IaC&lt;/td&gt;
&lt;td&gt;AWS CDK (Python) — 8 stacks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Dev&lt;/td&gt;
&lt;td&gt;Docker Compose (PostgreSQL + Redis + LocalStack)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  AWS Services Used
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ECS Fargate&lt;/td&gt;
&lt;td&gt;Serverless container hosting for FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application Load Balancer&lt;/td&gt;
&lt;td&gt;Routes traffic to ECS tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudFront&lt;/td&gt;
&lt;td&gt;CDN + HTTPS termination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Document storage with presigned URLs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Textract&lt;/td&gt;
&lt;td&gt;OCR, table extraction, form parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Lambda&lt;/td&gt;
&lt;td&gt;Event-driven async processing pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon SQS + SNS&lt;/td&gt;
&lt;td&gt;Decoupled processing with dead letter queues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway WebSocket&lt;/td&gt;
&lt;td&gt;Real-time processing status notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aurora Serverless v2 (pgvector)&lt;/td&gt;
&lt;td&gt;Document metadata and vector embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ElastiCache Redis&lt;/td&gt;
&lt;td&gt;Rate limiting and response caching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Cognito&lt;/td&gt;
&lt;td&gt;JWT authentication and user management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSM Parameter Store&lt;/td&gt;
&lt;td&gt;Secure API key storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECR&lt;/td&gt;
&lt;td&gt;Docker image registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS X-Ray&lt;/td&gt;
&lt;td&gt;Distributed tracing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch&lt;/td&gt;
&lt;td&gt;Metrics, dashboards, cost alarms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS CDK (Python)&lt;/td&gt;
&lt;td&gt;Full infrastructure as code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Processing Pipeline
&lt;/h2&gt;

&lt;p&gt;When a user uploads a document, it triggers an async pipeline that runs entirely outside the FastAPI request cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 — Direct S3 Upload via Presigned URL
&lt;/h3&gt;

&lt;p&gt;The frontend requests a presigned POST URL from the backend, then uploads the file directly to S3 — the binary never touches the FastAPI server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# api/v1/upload.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_presigned_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PresignedUrlRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_current_user&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="n"&gt;s3_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_presigned_post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s3_raw_bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;s3_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ExpiresIn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upload_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upload_fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the upload completes, the frontend calls &lt;code&gt;/api/v1/upload/complete/{doc_id}&lt;/code&gt; — this triggers the async pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4tjyvayzjt0ub0viy4zh.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4tjyvayzjt0ub0viy4zh.jpg" alt="AI Document Processor — upload documents UI" width="800" height="773"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Lambda Orchestrator: Textract
&lt;/h3&gt;

&lt;p&gt;The S3 &lt;code&gt;ObjectCreated&lt;/code&gt; event triggers the Orchestrator Lambda, which starts an async Textract job and updates the document status to &lt;code&gt;processing&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Textract runs asynchronously — for a 20-page PDF it can take 30–60 seconds. When Textract finishes, it publishes a completion event to an SNS topic, which triggers the Processor Lambda.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Lambda Processor: Claude + Voyage AI + pgvector
&lt;/h3&gt;

&lt;p&gt;This is the most complex Lambda. It runs 6 steps in sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lambdas/processor/handler.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sns_record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sns_record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;doc_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JobTag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 1: fetch Textract result (paginated API)
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_fetch_textract_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JobId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: save raw Textract output to S3 processed bucket
&lt;/span&gt;        &lt;span class="nf"&gt;_save_textract_result_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 3: classify document type + generate summary with Claude
&lt;/span&gt;        &lt;span class="n"&gt;claude_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_classify_with_claude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 4: update DB → status=COMPLETED, doc_type, summary, page_count
&lt;/span&gt;        &lt;span class="nf"&gt;_update_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;claude_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;claude_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 5: chunk text → embed with Voyage AI → store in pgvector
&lt;/span&gt;        &lt;span class="nf"&gt;_store_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 6: push status update to browser via WebSocket
&lt;/span&gt;        &lt;span class="nf"&gt;_notify_websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Claude classification prompt&lt;/strong&gt; returns structured JSON — document type (invoice, contract, report, etc.) and a 2–3 sentence summary. Pydantic validates the JSON before saving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunking strategy&lt;/strong&gt;: 800 characters per chunk, 100-character overlap between chunks. Each chunk stores its page number — this is what enables page-level citations in the final answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voyage AI embeddings&lt;/strong&gt;: Each chunk is embedded using &lt;code&gt;voyage-3&lt;/code&gt; in a single batched API call — one call for all chunks in the document, not one call per chunk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — RAG Chat: Embed → Retrieve → Stream
&lt;/h3&gt;

&lt;p&gt;When the user asks a question, the RAG pipeline runs inside FastAPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# services/rag_service.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_question&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;voyageai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;voyage_api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voyage-3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_relevant_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SET LOCAL ivfflat.probes = 10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        SELECT chunk_index, page_number, content,
               1 - (embedding_vec &amp;lt;=&amp;gt; CAST(:embedding AS vector)) AS similarity
        FROM document_chunks
        WHERE document_id = :document_id
          AND embedding_vec IS NOT NULL
        ORDER BY embedding_vec &amp;lt;=&amp;gt; CAST(:embedding AS vector)
        LIMIT 5
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The top-5 most similar chunks are assembled into a context block and sent to Claude. Claude is constrained to only use the provided excerpts — if the answer isn't there, it says so. The response streams back to the browser via SSE, word by word.&lt;/p&gt;




&lt;h2&gt;
  
  
  WebSocket — Real-time Processing Status
&lt;/h2&gt;

&lt;p&gt;Processing a document takes 30–90 seconds. Instead of polling, the browser opens a WebSocket connection via API Gateway on page load. When the Processor Lambda finishes, it invokes the WebSocket Lambda which pushes a status update directly to the browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser ──WebSocket──► API Gateway ──► Lambda WS handler
                                           │
                          DynamoDB ◄───────┘
                       (connection store)
                                           ▲
                          Lambda Processor ─┘
                        (invokes WS handler
                         on job completion)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The WebSocket connection store (DynamoDB) maps &lt;code&gt;connection_id&lt;/code&gt; → &lt;code&gt;user_id&lt;/code&gt;. When the Processor Lambda calls &lt;code&gt;_notify_websocket(doc_id, status)&lt;/code&gt;, the WS Lambda looks up the user's active connections and pushes the update to all of them. If the user has multiple browser tabs open, all get updated simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cognito Auth — Per-User Document Isolation
&lt;/h2&gt;

&lt;p&gt;Every API request requires a valid Cognito JWT. The FastAPI dependency &lt;code&gt;get_current_user&lt;/code&gt; verifies the token against Cognito's JWKS endpoint and extracts the &lt;code&gt;sub&lt;/code&gt; claim as the user ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# core/security.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_cognito_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;jwks_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cognito-idp.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cognito_region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.amazonaws.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cognito_user_pool_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/.well-known/jwks.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;jwks_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PyJWKClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jwks_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;signing_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jwks_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_signing_key_from_jwt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;signing_key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;algorithms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RS256&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                      &lt;span class="n"&gt;audience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cognito_client_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All database queries filter by &lt;code&gt;owner_id = current_user.id&lt;/code&gt; — users can only see their own documents. This is enforced at the repository layer, not the route layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  API Endpoints
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/health&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Health check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/upload/presigned-url&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get presigned S3 POST URL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/upload/complete/{doc_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Confirm upload, trigger processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/documents&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List documents (paginated)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/documents/{doc_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get document metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/documents/{doc_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete document&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/documents/{doc_id}/chat&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;RAG chat — streams SSE tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Global search across all documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebSocket&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/ws&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Real-time processing status&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fglsvgkne65f6g6vcytyz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fglsvgkne65f6g6vcytyz.jpg" alt="AI Document Processor — search documents" width="800" height="773"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Infrastructure with AWS CDK — 8 Stacks
&lt;/h2&gt;

&lt;p&gt;Each stack has a single concern and is deployed independently.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stack&lt;/th&gt;
&lt;th&gt;What it creates&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Network&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;VPC, 2 AZs, NAT Gateway, S3 + SQS VPC endpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Storage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Raw + processed S3 buckets, SQS queue + DLQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Auth&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cognito User Pool + App Client&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Database&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aurora Serverless v2 (PostgreSQL + pgvector)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Processing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lambda Orchestrator (S3 trigger) + Lambda Processor (SNS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-WebSocket&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API Gateway WebSocket + Lambda connect/disconnect/send&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Compute&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ECS Fargate service + ALB + ECR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DocuAI-Monitoring&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CloudWatch dashboards + alarms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# infrastructure/stacks/processing_stack.py (simplified)
&lt;/span&gt;&lt;span class="n"&gt;orchestrator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lambda_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Orchestrator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lambda_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PYTHON_3_12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handler.handler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQS_QUEUE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqs_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;queue_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USE_TEXTRACT_STUB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# S3 trigger → Orchestrator Lambda
&lt;/span&gt;&lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_event_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EventType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT_CREATED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;s3n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LambdaDestination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lambda_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lambda_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PYTHON_3_12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;handler.handler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_url_ssm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;string_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WEBSOCKET_LAMBDA_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ws_lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Textract SNS completion → Processor Lambda
&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_event_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;lambda_event_sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SnsEventSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;textract_sns_topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Database stack uses &lt;code&gt;removal_policy=RETAIN&lt;/code&gt; — Aurora is never deleted on &lt;code&gt;cdk destroy&lt;/code&gt;. This prevents accidental data loss during infrastructure teardown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjy2s9ocl4rwn8zpiwe78.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjy2s9ocl4rwn8zpiwe78.jpg" alt="AI Document Processor — CloudWatch dashboard" width="800" height="668"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Local Development with Docker Compose
&lt;/h2&gt;

&lt;p&gt;All AWS services are emulated locally using LocalStack, so you can develop without AWS credentials or costs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start Postgres + Redis + LocalStack&lt;/span&gt;
docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# Run migrations&lt;/span&gt;
alembic upgrade &lt;span class="nb"&gt;head&lt;/span&gt;

&lt;span class="c"&gt;# Start API&lt;/span&gt;
uvicorn app.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt; &lt;span class="nt"&gt;--port&lt;/span&gt; 8000

&lt;span class="c"&gt;# Start local Lambda worker (simulates S3 trigger + Textract + Processor)&lt;/span&gt;
make worker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;USE_TEXTRACT_STUB=true&lt;/code&gt; environment variable skips real Textract and uses a stub document — so the full pipeline runs locally without any AWS calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing Strategy
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;backend
pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="nt"&gt;--cov&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;app          &lt;span class="c"&gt;# all tests with coverage&lt;/span&gt;
pytest tests/api/test_upload.py     &lt;span class="c"&gt;# upload endpoints&lt;/span&gt;
pytest tests/api/test_chat.py       &lt;span class="c"&gt;# RAG chat + SSE streaming&lt;/span&gt;
pytest tests/services/              &lt;span class="c"&gt;# service layer (S3, Textract, Claude, RAG)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All AWS services are mocked with &lt;code&gt;moto&lt;/code&gt;. Claude and Voyage AI are mocked at the service level — no real API calls in tests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkip1vnhdvyc0oynm8727.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkip1vnhdvyc0oynm8727.jpg" alt="AI Document Processor — CI/CD pipeline" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Async processing is the right pattern for heavy AI workloads&lt;/strong&gt;&lt;br&gt;
Running Textract + Claude + embedding in a Lambda decoupled from the API request means the HTTP call returns immediately. The user sees progress via WebSocket instead of waiting on a 60-second HTTP request that might time out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. WebSocket via API Gateway + Lambda is simpler than it looks&lt;/strong&gt;&lt;br&gt;
The connection store (DynamoDB mapping connection IDs to user IDs) is the key insight. Once that's in place, pushing updates from any Lambda becomes a one-line API call to the WebSocket management endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Batch embeddings — one API call per document, not per chunk&lt;/strong&gt;&lt;br&gt;
Generating embeddings in a single batched call (all chunks at once) vs. one call per chunk is 10x faster and significantly cheaper. Voyage AI's &lt;code&gt;client.embed([text1, text2, ...])&lt;/code&gt; handles batches natively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. pgvector on Aurora Serverless v2 is a production-ready RAG store&lt;/strong&gt;&lt;br&gt;
Aurora Serverless v2 scales to zero when idle and handles the pgvector cosine similarity queries with &lt;code&gt;ivfflat&lt;/code&gt; indexing. For a document Q&amp;amp;A workload with moderate traffic, it's cheaper and simpler than a dedicated vector database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Eight CDK stacks kept the infrastructure manageable&lt;/strong&gt;&lt;br&gt;
One concern per stack means &lt;code&gt;cdk diff DocuAI-Processing-dev&lt;/code&gt; shows only Lambda changes, not the entire 200-resource deployment. Stacks also have clear dependency chains — Network → Storage/Auth → Database → Processing/WebSocket → Compute → Monitoring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Cognito JWKS caching is essential&lt;/strong&gt;&lt;br&gt;
Without caching, every API request would fetch the JWKS endpoint — adding 200ms+ latency and hitting Cognito rate limits under load. &lt;code&gt;PyJWKClient(jwks_url, cache_keys=True)&lt;/code&gt; fixes both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. &lt;code&gt;USE_TEXTRACT_STUB&lt;/code&gt; made local development possible&lt;/strong&gt;&lt;br&gt;
Real Textract requires S3 objects in a real AWS bucket and takes 10–30 seconds per document. A stub that returns a fixed text string lets the full pipeline run in under a second locally — including Claude classification, chunking, embedding, and pgvector storage.&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub
&lt;/h2&gt;

&lt;p&gt;Key code samples (RAG service, Lambda processor, CDK stacks) are available on GitHub:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/sanjaypatoliya/ai-document-processor-sample" rel="noopener noreferrer"&gt;github.com/sanjaypatoliya/ai-document-processor-sample&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm Sanjay Patoliya — AWS Certified Solutions Architect &amp;amp; DevOps Engineer building production-ready AI systems on AWS. If you're looking to build something similar or need a remote AWS + AI engineer, feel free to reach out.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Portfolio: &lt;a href="https://sanjaypatoliya.com" rel="noopener noreferrer"&gt;sanjaypatoliya.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://linkedin.com/in/sanjaykumar-patoliya-b234a287/" rel="noopener noreferrer"&gt;linkedin.com/in/sanjaykumar-patoliya-b234a287&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/sanjaypatoliya" rel="noopener noreferrer"&gt;github.com/sanjaypatoliya&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>cloud</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>Building an AI Research Agent with LangGraph, Claude, and AWS</title>
      <dc:creator>Sanjay Patoliya</dc:creator>
      <pubDate>Thu, 11 Jun 2026 15:00:12 +0000</pubDate>
      <link>https://dev.to/sanjaypatoliya/building-an-ai-research-agent-with-langgraph-claude-and-aws-1lbn</link>
      <guid>https://dev.to/sanjaypatoliya/building-an-ai-research-agent-with-langgraph-claude-and-aws-1lbn</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;An AI research agent that answers any question by searching the web, Wikipedia, and academic papers — then streams its reasoning back to the browser in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask any research question in natural language&lt;/li&gt;
&lt;li&gt;Watch the agent decide which tools to call — live, as it happens&lt;/li&gt;
&lt;li&gt;Read a structured report: Summary, Key Findings, Academic Research, Sources, Conclusion&lt;/li&gt;
&lt;li&gt;Ask follow-up questions — the agent remembers the full conversation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key difference from a simple chatbot: this agent is &lt;strong&gt;autonomous&lt;/strong&gt;. It decides when to search, what to search for, and which source to use. Claude does the reasoning; LangGraph controls the loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbu6555uyywjxv8isc8lv.png" alt="AI Research Agent — home screen with example questions and Agent Trace panel" width="800" height="591"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
  │
  ▼
CloudFront (HTTPS)
  ├── /* ──────────────► S3 (React static files)
  └── /api/v1/* ───────► ALB
                           │
                           ▼
                      ECS Fargate (FastAPI)
                           │
                           ▼
                      LangGraph ReAct Agent
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        Tavily API    Wikipedia API  arXiv API
      (web search)   (background    (academic
                       facts)        papers)
              └────────────┼────────────┘
                           ▼
                     Anthropic Claude
                   (reasoning + answer)
                           │
                           ▼
                   SSE Stream → Browser
               (tokens + tool trace events)
                           │
                           ▼
                       DynamoDB
                  (conversation history)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same CloudFront pattern as my &lt;a href="https://dev.to/sanjaypatoliya/building-an-ai-resume-scorer-with-aws-textract-and-claude-efl"&gt;previous post on the AI Resume Analyzer&lt;/a&gt; — &lt;code&gt;/api/v1/*&lt;/code&gt; proxies to the ALB so the frontend has a single HTTPS endpoint with no CORS issues.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React 18 + TypeScript + Vite + TailwindCSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI (Python 3.12)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Agent&lt;/td&gt;
&lt;td&gt;LangGraph ReAct + Anthropic Claude (claude-sonnet-4-6)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web Search&lt;/td&gt;
&lt;td&gt;Tavily API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge&lt;/td&gt;
&lt;td&gt;Wikipedia API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Academic Research&lt;/td&gt;
&lt;td&gt;arXiv API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;Amazon DynamoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming&lt;/td&gt;
&lt;td&gt;Server-Sent Events (SSE)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;ECS Fargate + ALB + CloudFront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IaC&lt;/td&gt;
&lt;td&gt;AWS CDK (Python)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  AWS Services Used
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ECS Fargate&lt;/td&gt;
&lt;td&gt;Serverless container hosting for FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application Load Balancer&lt;/td&gt;
&lt;td&gt;Routes traffic to ECS tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudFront&lt;/td&gt;
&lt;td&gt;CDN + HTTPS termination + SSE proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;React static file hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;td&gt;Conversation history with GSI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSM Parameter Store&lt;/td&gt;
&lt;td&gt;Secure API key storage (Anthropic + Tavily)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECR&lt;/td&gt;
&lt;td&gt;Docker image registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPC + NAT Gateway&lt;/td&gt;
&lt;td&gt;Private network with outbound internet for API calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS CDK (Python)&lt;/td&gt;
&lt;td&gt;Infrastructure as Code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Core: LangGraph ReAct Agent
&lt;/h2&gt;

&lt;p&gt;The most interesting part of this project is the agent itself. Instead of a fixed pipeline (step 1 → step 2 → step 3), a &lt;strong&gt;ReAct agent&lt;/strong&gt; reasons its way to an answer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reason&lt;/strong&gt; — think about what information is needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act&lt;/strong&gt; — call a tool (web search, Wikipedia, arXiv)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observe&lt;/strong&gt; — read the result&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeat&lt;/strong&gt; — until it has enough to answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;LangGraph models this as a &lt;code&gt;StateGraph&lt;/code&gt; — a directed graph where each node is either the Claude model or a tool executor, and edges define when to call tools vs. when to return the final answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent/graph.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CompiledGraph&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;last&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;should_continue&lt;/code&gt; function is the key decision point: if Claude's response includes tool calls, route to the &lt;code&gt;tools&lt;/code&gt; node; otherwise, the answer is ready and the graph ends.&lt;/p&gt;

&lt;h3&gt;
  
  
  AgentState and Conversation Memory
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AgentState&lt;/code&gt; holds the full message history using LangGraph's &lt;code&gt;add_messages&lt;/code&gt; reducer — new messages are appended to the list rather than replacing it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent/state.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseMessage&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single field is all that's needed for multi-turn memory. Each conversation starts from DynamoDB history, so follow-up questions have full context of everything said before.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Research Tools
&lt;/h2&gt;

&lt;p&gt;The agent has three tools. Claude autonomously decides which ones to call based on the question.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tavily — Web Search
&lt;/h3&gt;

&lt;p&gt;Tavily is purpose-built for LLM agents. It returns clean, structured results without the noise of a raw search engine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent/tools.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.tools.tavily_search&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TavilySearchResults&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;web_search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TavilySearchResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the web for current information, news, and general knowledge.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wikipedia — Background Facts
&lt;/h3&gt;

&lt;p&gt;For well-established topics, Wikipedia gives reliable background without burning Tavily quota.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wikipedia_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search Wikipedia for background information on a topic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;wiki&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wikipediaapi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Wikipedia&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai-research-agent/1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wiki&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No Wikipedia article found for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Truncate to 3000 chars to stay within Claude's context budget
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  arXiv — Academic Papers
&lt;/h3&gt;

&lt;p&gt;For scientific or technical questions, the agent searches arXiv for peer-reviewed research. This differentiates the output from a standard web search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;arxiv_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search arXiv for academic papers and research on a topic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;arxiv&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arxiv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                           &lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arxiv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SortCriterion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Relevance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;arxiv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Title: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Authors: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;authors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                      &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summary: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;URL: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;entry_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;papers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;papers&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No papers found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  SSE Streaming — Tokens + Live Agent Trace
&lt;/h2&gt;

&lt;p&gt;This is the part that makes the UI feel alive. Instead of waiting 20 seconds for a complete answer, the browser receives two types of events simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trace events&lt;/strong&gt; — every tool call the agent makes, shown in a live panel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token events&lt;/strong&gt; — the answer streaming word by word&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SSE event protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data: {"type": "conversation_id", "conversation_id": "abc-123"}

data: {"type": "trace", "step": "tool_start", "tool": "web_search", "input": "LangGraph tutorial"}

data: {"type": "trace", "step": "tool_end", "tool": "web_search", "output": "LangGraph is..."}

data: {"type": "token", "content": "LangGraph is a "}

data: {"type": "token", "content": "framework for building..."}

data: {"type": "done"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The streaming endpoint uses LangGraph's &lt;code&gt;astream_events&lt;/code&gt; — an async generator that yields every event in the agent's execution graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# routers/chat.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_agent_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;dynamodb_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astream_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on_tool_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data: {{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on_tool_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data: {{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;on_chat_model_stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The FastAPI endpoint sets &lt;code&gt;media_type="text/event-stream"&lt;/code&gt; and returns a &lt;code&gt;StreamingResponse&lt;/code&gt; — no WebSockets needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqchnpxgyf99v846bacmz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqchnpxgyf99v846bacmz.png" alt="Live research in progress — Agent Trace panel showing tool calls alongside the streaming response" width="800" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgxtj2utfxpbn6612tab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgxtj2utfxpbn6612tab.png" alt="Full structured response — Summary, Key Findings, Academic Research, Sources, Conclusion" width="800" height="591"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-turn Conversations
&lt;/h2&gt;

&lt;p&gt;Every conversation is stored in DynamoDB. When the user sends a follow-up question, the full history is loaded and passed back into the agent — so it has full context of everything discussed.&lt;/p&gt;

&lt;h3&gt;
  
  
  DynamoDB Schema
&lt;/h3&gt;

&lt;p&gt;The table uses a single-table design with a GSI for listing conversations by recency:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Key&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CONV#{conversation_id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Partition key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;META&lt;/code&gt; or &lt;code&gt;MSG#{timestamp}&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Sort key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;entity_type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;conversation&lt;/code&gt; or &lt;code&gt;message&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;GSI partition key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;updated_at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ISO timestamp&lt;/td&gt;
&lt;td&gt;GSI sort key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The GSI (&lt;code&gt;entity-type-index&lt;/code&gt;) makes it efficient to list all conversations sorted by most recently updated — without scanning the full table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# services/dynamodb_service.py
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batch_writer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONV#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conversation_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MSG#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updated_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkr0i3wrokc3if1js05a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkr0i3wrokc3if1js05a.png" alt="AI Research Agent — home screen with example questions and Agent Trace panel" width="800" height="596"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  API Endpoints
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/health&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Health check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/chat/stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stream agent response + tool trace via SSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/conversations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List all conversations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/conversations/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get conversation with full message history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/conversations/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete a conversation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Infrastructure with AWS CDK
&lt;/h2&gt;

&lt;p&gt;Two stacks — simpler than the resume analyzer because the frontend is a pure static site with no server-side rendering.&lt;/p&gt;

&lt;h3&gt;
  
  
  BackendStack
&lt;/h3&gt;

&lt;p&gt;VPC with 2 AZs, 1 NAT Gateway, ECS Fargate service, and ALB. Both API keys are injected from SSM Parameter Store at container startup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# stacks/backend_stack.py (simplified)
&lt;/span&gt;&lt;span class="n"&gt;fargate_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ecs_patterns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ApplicationLoadBalancedFargateService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BackendService&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_limit_mib&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_image_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ecs_patterns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ApplicationLoadBalancedTaskImageOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ContainerImage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_ecr_repository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ecr_repo&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DYNAMODB_TABLE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_ssm_parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;anthropic_key&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_ssm_parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tavily_key&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Long agent reasoning chains need more than the default 60s
&lt;/span&gt;&lt;span class="n"&gt;fargate_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_balancer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_timeout.timeout_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;120&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  FrontendStack
&lt;/h3&gt;

&lt;p&gt;CloudFront with two important timeout settings for SSE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# stacks/frontend_stack.py (simplified)
&lt;/span&gt;&lt;span class="n"&gt;api_behaviour&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cloudfront&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BehaviorOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;origin&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alb_origin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;viewer_protocol_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cloudfront&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ViewerProtocolPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;REDIRECT_TO_HTTPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cache_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cloudfront&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CachePolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CACHING_DISABLED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;origin_request_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cloudfront&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OriginRequestPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ALL_VIEWER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# SSE streams can run for 60-120s — increase read timeout
&lt;/span&gt;    &lt;span class="c1"&gt;# Default is 30s; streaming connections silently drop without this
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CloudFront read timeout&lt;/code&gt; (default 30s) must be extended for SSE. Unlike REST endpoints, SSE connections stay open while the agent reasons and writes — they can easily exceed 30 seconds on complex questions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing Strategy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt; — pytest with mocked &lt;code&gt;astream_events&lt;/code&gt; and DynamoDB. No real API calls needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;backend
pytest                          &lt;span class="c"&gt;# all tests&lt;/span&gt;
pytest tests/agent/             &lt;span class="c"&gt;# ReAct agent + tools&lt;/span&gt;
pytest tests/routers/           &lt;span class="c"&gt;# API endpoints + SSE format&lt;/span&gt;
pytest &lt;span class="nt"&gt;--cov&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;app                &lt;span class="c"&gt;# with coverage report&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test file&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_state.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;add_messages&lt;/code&gt; appends correctly, preserves history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_tools.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wikipedia found/not found/truncation, arXiv search, &lt;code&gt;get_tools&lt;/code&gt; returns 3 tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_chat.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SSE event format, trace events, token events, done/error, input validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_conversations.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List/get/delete endpoints, 404 handling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Frontend&lt;/strong&gt; — Vitest + Testing Library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;frontend
npm &lt;span class="nb"&gt;test
&lt;/span&gt;npm run &lt;span class="nb"&gt;test&lt;/span&gt;:coverage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. ReAct agents decide — you don't have to hardcode logic&lt;/strong&gt;&lt;br&gt;
A fixed pipeline would need explicit code for "if scientific question → use arXiv". The ReAct loop lets Claude make that call based on context. The agent often combines all three tools on a single question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. CloudFront read timeout must be increased for SSE&lt;/strong&gt;&lt;br&gt;
The default CloudFront read timeout is 30 seconds. SSE connections for complex research questions run longer. Without setting this to 120s, streams silently drop mid-response — no error, just a frozen browser. This was the hardest bug to diagnose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. ALB idle timeout must also be extended&lt;/strong&gt;&lt;br&gt;
Both CloudFront (read timeout) and ALB (idle timeout) need to be set to 120s. Fixing only one still drops long connections. The ALB's default 60s kicks in before CloudFront's if not changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. NAT Gateway is required for private ECS tasks&lt;/strong&gt;&lt;br&gt;
ECS tasks in a private subnet need a NAT Gateway to reach Anthropic, Tavily, Wikipedia, and arXiv. Without it, all external API calls silently time out — no DNS error, just a hanging request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. &lt;code&gt;add_messages&lt;/code&gt; is all you need for conversation memory&lt;/strong&gt;&lt;br&gt;
LangGraph's &lt;code&gt;add_messages&lt;/code&gt; reducer appends new messages to the state list automatically. Passing the full DynamoDB history into the initial state gives the agent complete context of past turns with zero extra logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Build Docker images for &lt;code&gt;linux/amd64&lt;/code&gt; on Apple Silicon&lt;/strong&gt;&lt;br&gt;
ECS Fargate runs on x86 by default. Building on an M-series Mac without specifying the platform creates an &lt;code&gt;arm64&lt;/code&gt; image that won't start on Fargate. Always add &lt;code&gt;--platform linux/amd64&lt;/code&gt; to the Docker build, or set it in the Dockerfile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Two SSM parameters — both injected at container startup&lt;/strong&gt;&lt;br&gt;
This project needs two API keys: Anthropic and Tavily. Both are stored as SSM SecureString and injected as environment variables at container startup. Keys never appear in CDK output, CloudFormation templates, or Docker images.&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub
&lt;/h2&gt;

&lt;p&gt;The full source code is available on GitHub:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/sanjaypatoliya/ai-research-agent" rel="noopener noreferrer"&gt;github.com/sanjaypatoliya/ai-research-agent&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm Sanjay Patoliya — AWS Certified engineer with 7 AWS certifications building production-ready AI systems on AWS.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://linkedin.com/in/sanjaykumar-patoliya-b234a287/" rel="noopener noreferrer"&gt;linkedin.com/in/sanjaykumar-patoliya-b234a287&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/sanjaypatoliya" rel="noopener noreferrer"&gt;github.com/sanjaypatoliya&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Email: &lt;a href="mailto:sbpatoliya@gmail.com"&gt;sbpatoliya@gmail.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://sanjaypatoliya.com/blog/ai-research-agent-aws" rel="noopener noreferrer"&gt;sanjaypatoliya.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>python</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>Building an AI Resume Scorer with AWS Textract and Claude</title>
      <dc:creator>Sanjay Patoliya</dc:creator>
      <pubDate>Thu, 11 Jun 2026 14:43:44 +0000</pubDate>
      <link>https://dev.to/sanjaypatoliya/building-an-ai-resume-scorer-with-aws-textract-and-claude-efl</link>
      <guid>https://dev.to/sanjaypatoliya/building-an-ai-resume-scorer-with-aws-textract-and-claude-efl</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;A full-stack AI application that scores how well a resume matches a job description — giving candidates a score from 0–100 across 5 categories with specific improvement suggestions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload a resume PDF&lt;/li&gt;
&lt;li&gt;Paste a job description&lt;/li&gt;
&lt;li&gt;Get an AI-powered match score with detailed feedback&lt;/li&gt;
&lt;li&gt;View history of past analyses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The entire backend pipeline runs on AWS: Textract extracts the resume text, Claude AI does the intelligent scoring, and results are saved to DynamoDB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fks95acpxt1vs2sgsswhk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fks95acpxt1vs2sgsswhk.png" alt="AI Resume Analyzer — upload resume and paste job description" width="800" height="900"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
  │
  ▼
CloudFront (HTTPS)
  ├── /* ──────────────► S3 (React static files)
  └── /api/v1/* ───────► ALB
                           │
                           ▼
                      ECS Fargate (FastAPI)
                           │
                      S3 — fetch uploaded PDF
                           │
                      Textract — extract text (OCR)
                           │
                      Claude AI — score + suggestions
                           │
                      DynamoDB — save result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key architectural decision was routing &lt;code&gt;/api/v1/*&lt;/code&gt; through CloudFront to the ALB. This means the frontend uses a single HTTPS endpoint for both static files and API calls — no mixed content issues, no CORS problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React 18 + TypeScript + Vite + TailwindCSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI (Python 3.12)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;Anthropic Claude (claude-sonnet-4-6)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OCR&lt;/td&gt;
&lt;td&gt;AWS Textract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Amazon S3 + DynamoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;ECS Fargate + ALB + CloudFront&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IaC&lt;/td&gt;
&lt;td&gt;AWS CDK (Python)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  AWS Services Used
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ECS Fargate&lt;/td&gt;
&lt;td&gt;Serverless container hosting for FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application Load Balancer&lt;/td&gt;
&lt;td&gt;Routes traffic to ECS tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudFront&lt;/td&gt;
&lt;td&gt;CDN + HTTPS termination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Resume PDF storage + React static hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;td&gt;Analysis results and history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Textract&lt;/td&gt;
&lt;td&gt;Extract text from resume PDFs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSM Parameter Store&lt;/td&gt;
&lt;td&gt;Secure Anthropic API key storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECR&lt;/td&gt;
&lt;td&gt;Docker image registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPC + NAT Gateway&lt;/td&gt;
&lt;td&gt;Private network with internet access for Claude API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS CDK (Python)&lt;/td&gt;
&lt;td&gt;Infrastructure as Code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Analysis Pipeline
&lt;/h2&gt;

&lt;p&gt;When a user submits a resume and job description, this is what happens:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 — PDF Upload via Presigned URL
&lt;/h3&gt;

&lt;p&gt;The frontend requests a presigned S3 URL from the backend, then uploads the PDF directly to S3 from the browser. This avoids routing the binary file through the FastAPI server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# POST /api/v1/upload
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_presigned_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;s3_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uploads/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;upload_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_presigned_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;put_object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s3_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ContentType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;ExpiresIn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upload_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;upload_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s3_key&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2 — Text Extraction with AWS Textract
&lt;/h3&gt;

&lt;p&gt;Once the PDF is in S3, Textract extracts all the text. Textract handles complex PDF layouts — multi-column resumes, tables, headers — much better than simple PDF parsers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# textract_service.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s3_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;textract_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;detect_document_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3Object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;S3_BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s3_key&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;blocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Blocks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BlockType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LINE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; &lt;code&gt;detect_document_text&lt;/code&gt; is synchronous and works well for standard resumes. If you need to support 10+ page documents, switch to &lt;code&gt;start_document_analysis&lt;/code&gt; (asynchronous) — it handles larger files without timing out.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 3 — AI Scoring with Claude
&lt;/h3&gt;

&lt;p&gt;The extracted resume text and job description are sent to Claude with a structured prompt that asks for scores across 5 categories and improvement suggestions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 5 scoring categories:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skills Match&lt;/strong&gt; — how well technical skills align&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experience Level&lt;/strong&gt; — years and seniority match&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Education&lt;/strong&gt; — degree and field relevance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keywords&lt;/strong&gt; — ATS-critical keyword presence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ATS Formatting&lt;/strong&gt; — resume structure and formatting&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude returns a structured JSON response. To ensure the LLM's non-deterministic output doesn't break the frontend, I used &lt;strong&gt;Pydantic&lt;/strong&gt; in the FastAPI layer to validate the JSON structure before saving to DynamoDB — if Claude returns an unexpected format, the request fails fast with a clear error rather than silently corrupting data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgltiwh2uac6du45nm1wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgltiwh2uac6du45nm1wj.png" alt="AI Resume Analyzer — score breakdown across 5 categories with improvement suggestions" width="800" height="998"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overall_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"categories"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Skills Match"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Strong Python and AWS skills"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Experience Level"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5 years matches requirement"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Education"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Relevant CS degree"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Keywords"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Most keywords present"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ATS Formatting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Some formatting improvements needed"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"suggestions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Add quantifiable achievements to your experience section"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Include AWS certification names explicitly"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4 — Save to DynamoDB
&lt;/h3&gt;

&lt;p&gt;Results are saved with a UUID as the partition key. DynamoDB PAY_PER_REQUEST billing means no capacity planning and costs nothing when idle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ttblsz7smnp5x0octst.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ttblsz7smnp5x0octst.png" alt="AI Resume Analyzer — analysis history with match scores" width="800" height="388"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# dynamodb_service.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AnalysisResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;overall_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;overall_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggestions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  API Endpoints
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/health&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Health check&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/upload&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get presigned S3 URL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/analyze&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Analyze resume vs job description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/history&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;List past analyses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/history/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get single result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/api/v1/history/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete a result&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Infrastructure with AWS CDK
&lt;/h2&gt;

&lt;p&gt;All AWS resources are defined as code using CDK Python, split into 4 stacks:&lt;/p&gt;

&lt;h3&gt;
  
  
  StorageStack
&lt;/h3&gt;

&lt;p&gt;S3 bucket with server-side encryption and 30-day lifecycle expiry for uploaded PDFs.&lt;/p&gt;

&lt;h3&gt;
  
  
  DatabaseStack
&lt;/h3&gt;

&lt;p&gt;DynamoDB table with PAY_PER_REQUEST billing — no capacity planning needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  BackendStack
&lt;/h3&gt;

&lt;p&gt;The most complex stack. VPC with 2 AZs, 1 NAT Gateway (ECS tasks need outbound internet to reach the Anthropic API), ECS Fargate service, and Application Load Balancer.&lt;/p&gt;

&lt;p&gt;Key decision: &lt;strong&gt;ECS Fargate instead of Lambda&lt;/strong&gt; for the backend. The Claude AI response can take 10–30 seconds for detailed analysis. Lambda has a 15-minute limit but cold starts and the complexity of streaming responses made Fargate a cleaner choice for this workload. Fargate's stability during long-running LLM inferences over Lambda's potential timeout and cold-start issues was a deliberate choice for reliability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend_stack.py (simplified)
&lt;/span&gt;&lt;span class="n"&gt;fargate_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ecs_patterns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ApplicationLoadBalancedFargateService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BackendService&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_limit_mib&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_image_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ecs_patterns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ApplicationLoadBalancedTaskImageOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ContainerImage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_ecr_repository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ecr_repo&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;APP_ENV&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_ssm_parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key_param&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ALB timeout extended for Claude response time
&lt;/span&gt;&lt;span class="n"&gt;fargate_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load_balancer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idle_timeout.timeout_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;120&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  FrontendStack
&lt;/h3&gt;

&lt;p&gt;CloudFront with OAC (Origin Access Control) for S3, SPA routing (404 → &lt;code&gt;index.html&lt;/code&gt;), and a &lt;code&gt;/api/v1/*&lt;/code&gt; behaviour that proxies to the ALB. Cache invalidation runs automatically on each deploy. This pattern — CloudFront + ALB + S3 — is my go-to for production SPAs. It eliminates the CORS dance entirely and centralises SSL termination at the edge.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploy Pipeline
&lt;/h2&gt;

&lt;p&gt;A single &lt;code&gt;make deploy&lt;/code&gt; command runs everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make deploy
&lt;span class="c"&gt;# 1. Runs backend tests (pytest)&lt;/span&gt;
&lt;span class="c"&gt;# 2. Runs frontend tests (vitest)&lt;/span&gt;
&lt;span class="c"&gt;# 3. Builds React app (npm run build)&lt;/span&gt;
&lt;span class="c"&gt;# 4. CDK deploys all 4 stacks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployment stops if any test fails — no broken code reaches AWS.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing Strategy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Backend&lt;/strong&gt; — pytest with &lt;code&gt;moto&lt;/code&gt; to mock all AWS services. No real AWS credentials needed for tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;backend
pytest                    &lt;span class="c"&gt;# all tests with coverage&lt;/span&gt;
pytest tests/services/    &lt;span class="c"&gt;# service layer only&lt;/span&gt;
pytest tests/routers/     &lt;span class="c"&gt;# API endpoints only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Frontend&lt;/strong&gt; — Vitest + Testing Library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;frontend
npm &lt;span class="nb"&gt;test&lt;/span&gt;                  &lt;span class="c"&gt;# all tests&lt;/span&gt;
npm run &lt;span class="nb"&gt;test&lt;/span&gt;:coverage     &lt;span class="c"&gt;# with coverage report&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Use presigned URLs for file uploads&lt;/strong&gt;&lt;br&gt;
Routing PDFs through FastAPI adds unnecessary load and latency. Direct S3 upload from the browser via presigned URL is faster and cheaper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. ECS Fargate over Lambda for long AI calls&lt;/strong&gt;&lt;br&gt;
Lambda works for fast operations but AI scoring takes 10–30 seconds. Fargate containers are always warm and handle this naturally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. NAT Gateway is required for private ECS tasks&lt;/strong&gt;&lt;br&gt;
ECS tasks in a private subnet need a NAT Gateway to reach the Anthropic API. I initially tried without it and the AI calls silently timed out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Extend ALB idle timeout for AI workloads&lt;/strong&gt;&lt;br&gt;
The default ALB idle timeout is 60 seconds. Claude can take longer for detailed analysis. Setting it to 120s prevents connection resets mid-response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. CloudFront routing solves CORS completely&lt;/strong&gt;&lt;br&gt;
Proxying &lt;code&gt;/api/v1/*&lt;/code&gt; through CloudFront to the ALB means the frontend and API share the same domain. Zero CORS configuration needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. SSM Parameter Store for secrets&lt;/strong&gt;&lt;br&gt;
Never put API keys in environment variables or Docker images. SSM SecureString is injected at container startup — keys never appear in logs or CDK output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Auth is intentionally deferred to v2&lt;/strong&gt;&lt;br&gt;
The current version has no user authentication. Auth with AWS Cognito is planned for v2: user login, private history per user, and Cognito-authorizer on the API Gateway. Deferring auth kept v1 scope small and shippable.&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub
&lt;/h2&gt;

&lt;p&gt;The full source code is available on GitHub:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/sanjaypatoliya/ai-resume-analyzer" rel="noopener noreferrer"&gt;github.com/sanjaypatoliya/ai-resume-analyzer&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm Sanjay Patoliya — AWS Certified engineer with 7 AWS certifications building production-ready AI systems on AWS.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn: &lt;a href="https://linkedin.com/in/sanjaykumar-patoliya-b234a287/" rel="noopener noreferrer"&gt;linkedin.com/in/sanjaykumar-patoliya-b234a287&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/sanjaypatoliya" rel="noopener noreferrer"&gt;github.com/sanjaypatoliya&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Email: &lt;a href="mailto:sbpatoliya@gmail.com"&gt;sbpatoliya@gmail.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://sanjaypatoliya.com/blog/ai-resume-parser-aws" rel="noopener noreferrer"&gt;sanjaypatoliya.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>fastapi</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
