Inna Campo

Posted on Feb 27 • Edited on Mar 1

When Doctors Are Too Tired to Think Slow: Building CLARA with Gemini 3.1 Pro

#devchallenge #geminireflections #gemini #mentalhealth

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

Diagnostic error affects 12 million Americans annually. Behind that number are real encounters - patients who leave uncertain, physicians who move to the next room carrying doubt.

In women's health, this crisis is particularly acute: women are diagnosed on average 4 years later than men. A major factor driving this gap is Diagnostic Shadowing, a cognitive bias where a patient's physical symptoms are inadvertently misattributed to their psychiatric or chronic history.

It’s important to understand that this is rarely negligence. It is a breakdown of Dual Process Theory under pressure. High cognitive load forces burnt-out physicians to rely on System 1 (fast, heuristic thinking) just to keep up, bypassing System 2 (slow, analytical thinking). Necessary mental shortcuts can occasionally become unintended clinical blind spots.

To help solve this, I built CLARA (Clinical Logic Assessment & Reasoning Assistant).

CLARA is a multimodal AI agent powered by Gemini 3.1 Pro gemini-3.1-pro-preview. She acts as a Cognitive Forcing Strategy - an "External System 2" that supports physicians when cognitive bandwidth is limited.

Unlike standard AI medical scribes that merely record what is said, CLARA analyzes the logic of the encounter. Because Gemini 3.1 Pro is natively multimodal, CLARA can process consented consultation audio directly.

Designed as a post-encounter safety net, CLARA reviews recorded consultations prior to discharge, identifying high-risk reasoning patterns and generating standardized “Clinical Insights” in JSON format. In doing so, she transforms the subjective experience of “being dismissed” into objective, actionable data, protecting patients while supporting physicians under human constraints.

Demo

Check out the CLARA Github repository and live demo.
Test files can be downloaded here or try the Sample Cases.

What I Learned

Building CLARA taught me two massive lessons - one technical, and one deeply human.

1. The Human Element: Empathy in AI Design

Initially, I named the project the Clinical Logic Assessment & Reasoning Auditor. I programmed the AI to hunt for "dangerous biases" and output "Audit Flags." I quickly realized that this punitive framing would alienate the very people I was trying to help. Physicians are already under immense strain. I learned to completely rewrite my system prompts to shift the tone from evaluative to collaborative. CLARA is now an Assistant that looks for unintended clinical blind spots and offers gentle clinical nudges. Designing AI begins with understanding the psychology of the end user.

2. The Technical Element: Securing API Keys

While prototyping in Google AI Studio, I initially used the generated client-side JavaScript implementation. That approach quickly revealed a production concern: exposing the Gemini API key in the browser.

To harden the architecture, I introduced a dedicated Node.js/Express backend layer. The frontend now sends analysis requests to my server, which securely manages the API key via environment variables and handles all communication with Gemini. This shift transformed CLARA from a prototype into a securely deployable AI service.

Google Gemini Feedback

What Worked Well:

Gemini 3.1 Pro’s (and earlier Gemini 3 Pro's) native multimodality was transformative for this build. Not having to chain a separate Speech-to-Text model with an LLM significantly simplified the architecture. Furthermore, Gemini's ability to digest complex psychological concepts (like Anchoring Bias and Premature Closure) and apply them to raw conversational dialogue (while strictly adhering to my requested JSON output schema) was flawless.

Where I Ran Into Friction:

The primary friction point emerged around production security. Google AI Studio excels at rapid prototyping, and the generated client-side JavaScript is perfect for experimentation. However, moving toward deployment requires architectural adjustments, since client-side implementations expose API keys in the browser.

It would be powerful to see AI Studio offer a “full-stack” export option (for example, a minimal Node.js backend paired with a frontend scaffold) or include a more explicit production-readiness guide. That small addition could significantly reduce the learning curve for developers transitioning from prototype to secure deployment. This experience reinforced how important it is to think about security architecture as early as the prototyping phase when building AI systems.

Overall, building with Gemini 3.1 Pro allowed me to address a systemic healthcare issue with tools that did not exist just a few years ago. Next, I’m building SONA for the Gemini Live Agent Challenge, a voice-based simulator designed to strengthen women’s health communication. By allowing users to rehearse how they articulate menopause symptoms, SONA uses AI to help patients communicate with clarity and confidence before they ever step into the clinic.

P.S. Deep Dive: System Architecture & Technical Implementation

To ensure clinical data was handled securely and efficiently, I built CLARA using a two-tier client-server architecture, containerized via a multi-stage Docker build for deployment on Google Cloud Run.

Here is a breakdown of the technical stack and how the AI integration works under the hood.

1. The Stack

Frontend: React 19, TypeScript, Vite, Tailwind CSS, Recharts
Backend: Node.js 20, Express
AI SDK: @google/genai (v1.31.0)
Model: gemini-3.1-pro-preview

2. Client-Server Data Flow & Audio Processing

One of the core constraints of building with LLMs in the browser is API key security. To solve this, the React SPA never interacts with the Gemini API directly. Instead, I built a thin Express proxy server.

When a doctor uploads a consultation recording, the client-side process looks like this:

Validation: The React app enforces a 10MB file limit.
Encoding: The audio file is read via FileReader and converted into a base64 string directly in the browser using a custom utility function.
Transmission: The client POSTs the { type: 'audio', data, mimeType } payload to the Express server (/api/analyze).

To handle these large base64 strings, I had to configure the Express backend body parser with a 15MB limit (express.json({ limit: '15mb' })). The server then wraps this base64 data into an inlineData object and ships it to Gemini.

Note: Because Gemini 3.1 Pro is natively multimodal, we completely bypass the need for a separate Speech-to-Text (STT) transcribing step. The model ingests the audio directly.

Future Optimization: The Gemini File API
While the base64 inlineData approach works perfectly for short consultations, it creates memory overhead and strict payload constraints on the Express server. Moving forward, I plan to migrate this pipeline to use the Gemini File API. Instead of passing raw base64 strings in the JSON body, the backend will upload the raw audio directly to Google's infrastructure (using ai.files.upload), retrieve a fileUri, and pass that lightweight reference to the model. This will eliminate payload bottlenecks, reduce server memory pressure, and allow CLARA to seamlessly process much longer, high-fidelity consultation recordings.

3. Controlling the LLM: Structured Outputs & Temperature

Extracting consistent, usable data from LLMs can be notoriously brittle. To solve this, I leveraged Gemini's Structured Output capabilities.

I configured the ai.models.generateContent() call with a responseMimeType: 'application/json' and passed a strict responseSchema using the SDK's Type enum. This forces Gemini to map its analysis into an array of AuditFlag objects, matching this exact TypeScript interface:

interface AuditFlag {  
  timestamp: string;  
  bias_type: "Diagnostic Shadowing" | "Premature Closure" | "Anchoring Bias" | "Safe Practice";  
  risk_level: "High" | "Medium" | "Low" | "None";  
  dialogue_trigger: string;  
  clinical_reasoning: string;  
}

Because clinical analysis requires high determinism and zero hallucination, I also set the model's temperature to 0.2.

4. Advanced Prompt Engineering

The CLARA_SYSTEM_INSTRUCTION (an approx. 2,000-token system prompt) is the brain of the application. To prevent false positives, I engineered specific analysis rules:

Logic over Tone: The model is explicitly instructed that a provider can be highly empathetic but still exhibit a clinical blind spot, or be abrupt but logically thorough. It must focus only on reasoning pathways.
The "Testing Gap": The prompt trains the model to flag diagnoses made purely on assumptions rather than objective data (e.g., diagnosing a panic attack in a patient with tachycardia without ordering an EKG).
Constructive Framing: The model is commanded to output its clinical_reasoning as "gentle clinical nudges" rather than accusations, ensuring the resulting UI is supportive rather than punitive.

5. Frontend State Management

The UI is driven by a finite state machine managed by a single useState<AnalysisState> hook in the root App.tsx. Depending on the status (idle, uploading, processing, complete, or error), the component conditionally routes the user through the experience. Once the JSON payload returns from the server, Recharts processes the aggregated risk levels to render a donut chart, while a dynamic list of color-coded Audit Cards maps out the clinical insights.

DEV Community