How we built real-time deposition analysis with Claude's streaming API

#ai #claude #node #legaltech

Medical-malpractice plaintiff attorneys spend 3+ hours in expert depositions hunting for two things: admissions they can use at trial, and inconsistencies they can impeach. Both windows close in seconds. If you don't catch them live, you're reading the transcript a week later wishing you had.

We built a live-feed analyzer that watches the deposition stream, runs Claude against every 30-second window, and surfaces real-time signals to the attorney's laptop while they question the witness.

Architecture

Three hops:

Deepgram transcribes the live audio over WebSocket
Our Node WS server buffers transcript into 30-second segments
Claude (Haiku 4.5, streaming) analyzes each segment and returns a 12-key JSON

The JSON is the heart of the system. Every segment returns:

{
  "medical":         { accuracyScore, inaccuracies, accurateStatements, confidence, summary },
  "daubert":         { vulnerabilityScore, vulnerabilities, strengths, overallRisk },
  "priorTestimony":  { inconsistencies, impeachmentOpportunities, summary },
  "crossExam":       { questions, keyWeaknesses, recommendedApproach },
  "elements":        { duty, breach, causation, damages },   // each { advanced, quote }
  "admission":       { isAdmission, quote, significance, whyMatters },
  "evasion":         { isEvasive, pattern, escalationScript },
  "coverage":        { topicsCovered, notes },
  "foundation":      { triggers },                              // FRE 613/803(18)/803(6)/702/30(b)(6)
  "chartContradiction": { contradicted, witnessClaim, chartEvidence, severity },
  "literatureHits":  [{ witnessClaim, pubmedQuery, foundationScript, pubmedUrl }]
}

The per-segment loop

// Per segment: single Claude call with the whole expert-witness-specific prompt
const msg = await anthropic.messages.create({
  model: 'claude-haiku-4-5-20251001',
  max_tokens: 8192,                // critical — 4096 truncates crossExam mid-stream
  messages: [{ role: 'user', content: buildPrompt(segment, caseContext, chart) }],
});
const analysis = sanitizeResult(JSON.parse(extractJson(msg.content[0].text)));
ws.send(JSON.stringify({ type: 'analysis', analysis, segment }));

Three things we got wrong the first time:

max_tokens=4096 was too small. The 12-key output needs ~6-8K on dense segments. If crossExam is written near the end of the stream, it gets truncated and the UI shows "Cross-examination analysis failed." Bumped to 8192.
Chart context wasn't propagating. chartContradiction can't fire without the chart data in the prompt. We now stash ws._sessionChartContext on a setChartContext WS message before analysis begins.
Cloudflare killed idle WebSockets after 100s. Claude's longer analyses took 45-90s, and during dense segments the WS went silent. Added a 25s keepalive ping from the client.

What we skipped (for now)

PDF.js text-layer positioning for the chart contradiction pin (today it's at the file-list row, not the page)
Firm-scoped vector index over historical transcripts (cross-case expert inconsistency)
Live PubMed API calls — today we generate the search query, the attorney clicks through

Full writeup (including the chart-cross-reference and co-counsel channel) is on our blog at medicalai.law/blog/how-ai-analyzes-deposition-real-time.

Questions welcome.