<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: John Mahoney</title>
    <description>The latest articles on DEV Community by John Mahoney (@john_mahoney_41e9c2589ceb).</description>
    <link>https://dev.to/john_mahoney_41e9c2589ceb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3864776%2F26a7deb8-e1d4-4d5b-bb69-5bce9d9993bb.png</url>
      <title>DEV Community: John Mahoney</title>
      <link>https://dev.to/john_mahoney_41e9c2589ceb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/john_mahoney_41e9c2589ceb"/>
    <language>en</language>
    <item>
      <title>Processing 1,500 Pages of Medical Records in 3 Minutes with AI</title>
      <dc:creator>John Mahoney</dc:creator>
      <pubDate>Wed, 15 Apr 2026 13:24:29 +0000</pubDate>
      <link>https://dev.to/john_mahoney_41e9c2589ceb/how-we-built-an-ai-powered-medical-records-extraction-pipeline-2k0p</link>
      <guid>https://dev.to/john_mahoney_41e9c2589ceb/how-we-built-an-ai-powered-medical-records-extraction-pipeline-2k0p</guid>
      <description>&lt;p&gt;Medical malpractice attorneys deal with thousands of pages of medical records per case. Organizing those records into a chronological timeline is the foundation of every case — and it's historically been done by hand, taking 20-40 hours per case.&lt;/p&gt;

&lt;p&gt;We built a pipeline that extracts structured data from uploaded medical record PDFs, streams AI-generated analysis back to the browser in real time, and handles files up to 500MB. Here's how it works.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;John Mahoney, Founder @ &lt;a href="https://medicalai.law" rel="noopener noreferrer"&gt;MedLegal AI&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The system has four stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Upload&lt;/strong&gt; — Browser uploads PDFs directly to S3 via presigned URLs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract&lt;/strong&gt; — Server pulls the file from S3, runs OCR if needed, extracts raw text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyze&lt;/strong&gt; — Text is sent to Claude API for structured extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream&lt;/strong&gt; — Results stream back to the browser via SSE as they're generated&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Stage 1: Presigned S3 Uploads
&lt;/h2&gt;

&lt;p&gt;Medical record PDFs are large. 200-500MB is common. We're deployed behind Cloudflare and Railway, both with upload size limits.&lt;/p&gt;

&lt;p&gt;The solution: the browser uploads directly to S3 via presigned PUT URLs.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`javascript&lt;br&gt;
const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');&lt;br&gt;
const { getSignedUrl } = require('@aws-sdk/s3-request-presigner');&lt;/p&gt;

&lt;p&gt;async function generatePresignedUpload(userId, fileName) {&lt;br&gt;
  const fileId = crypto.randomUUID() + '.pdf';&lt;br&gt;
  const s3Key = `case-analysis/uploads/\${userId}/\${fileId}`;&lt;/p&gt;

&lt;p&gt;const presignClient = new S3Client({&lt;br&gt;
    region: process.env.AWS_REGION,&lt;br&gt;
    requestChecksumCalculation: 'WHEN_REQUIRED',&lt;br&gt;
    responseChecksumValidation: 'WHEN_REQUIRED',&lt;br&gt;
  });&lt;/p&gt;

&lt;p&gt;const putCmd = new PutObjectCommand({&lt;br&gt;
    Bucket: process.env.S3_BUCKET,&lt;br&gt;
    Key: s3Key,&lt;br&gt;
    ServerSideEncryption: 'AES256',&lt;br&gt;
  });&lt;/p&gt;

&lt;p&gt;return await getSignedUrl(presignClient, putCmd, { expiresIn: 600 });&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key gotcha:&lt;/strong&gt; AWS SDK v3 adds checksum query params that break browser PUT requests. Set &lt;code&gt;requestChecksumCalculation: 'WHEN_REQUIRED'\&lt;/code&gt; to fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 2: Text Extraction with OCR Fallback
&lt;/h2&gt;

&lt;p&gt;We try pdf-parse first (fast, digital PDFs), then fall back to Poppler + Tesseract for scanned documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: AI Analysis with Claude
&lt;/h2&gt;

&lt;p&gt;We use Claude's streaming Messages API. Rate limiting is handled with exponential backoff and user-visible status messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 4: SSE Streaming
&lt;/h2&gt;

&lt;p&gt;Server-Sent Events give us real-time streaming from server to browser. We use fetch + ReadableStream instead of EventSource because we need POST requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical for Railway:&lt;/strong&gt; Send headers immediately and keepalive comments every 30s to prevent proxy timeouts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;1,500 pages processed in 3-5 minutes vs. 20-40 hours manually. SSE streaming means users see the timeline being built in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Node.js 20+, Claude API, AWS S3, Poppler + Tesseract, React + Vite, Railway&lt;/p&gt;




&lt;p&gt;&lt;em&gt;John Mahoney builds AI tools for medical malpractice litigation at &lt;a href="https://medicalai.law" rel="noopener noreferrer"&gt;medicalai.law&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>legaltech</category>
      <category>healthtech</category>
      <category>aws</category>
    </item>
    <item>
      <title>Building an AI Pipeline to Process 10,000+ Pages of Medical Records</title>
      <dc:creator>John Mahoney</dc:creator>
      <pubDate>Tue, 07 Apr 2026 02:36:50 +0000</pubDate>
      <link>https://dev.to/john_mahoney_41e9c2589ceb/building-an-ai-pipeline-to-proai-machinelearning-webdev-saascess-10000-pages-of-medical-records-3218</link>
      <guid>https://dev.to/john_mahoney_41e9c2589ceb/building-an-ai-pipeline-to-proai-machinelearning-webdev-saascess-10000-pages-of-medical-records-3218</guid>
      <description></description>
    </item>
  </channel>
</rss>
