DEV Community

Cover image for From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot
Keerthana
Keerthana

Posted on

From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish‑Up‑A‑Thon Challenge

What I Built

I built MedClinic, a fully local AI‑powered medical assistant that runs on a MedGamma‑2B‑class model without any third‑party APIs or cloud services.

Instead of slapping a shiny frontend on an off‑the‑shelf API, I:

  • Wrote the entire orchestration layer by hand (no pre‑trained wrappers).
  • Pipelined plain user text → MedGamma‑2B inference → structured JSON response as a pure inference pipeline.
  • Did not use any external API — everything lives on‑device.

The abandoned prototype (3 months ago)

BEFORE PROTOTYPE

BEFORE

Demo

Link: https://github.com/pulipatikeerthana9-wq/medclinic-voice-scribe

Now changed to

CHANGED TO

CHANGED TO

CHANGED TO

CHANGED TO

The Comeback Story

MedClinic started as a half‑dead prototype buried in a forgotten branch. The older version had:

  • Basic voice‑to‑text that I struggled to build without much prior experience, and it felt extremely hard to even get working.
  • A single monolithic function.
  • A 90‑second pause before every answer due to unoptimized inference.

I had just one ingredient: a local MedGamma‑2B‑like model sitting idle on my machine. No Play‑Cloud, no “API magic” — just raw model weights and a stubborn idea that a local‑only doctor‑in‑your‑laptop is possible.

What changed everything was GitHub Copilot:

  • Copilot became my architect for the pipeline.
  • My job was to sanity‑check the model design, trim the boilerplate, and own the safety guardrails.

In under a month, the MedClinic branch went from “proof of concept” to a hands‑on assistant that gives coherent, structured medical‑style answers — all without a single API call.

GitHub Copilot’s role (how it changed everything)

Here is where Copilot stepped in:

Pipeline design

I asked:

“How do I structure a voice‑input → MedGamma‑2B inference → structured JSON medical‑assistant pipeline?”

Copilot returned three layers:

  • input‑sanitizer
  • inference‑router
  • JSON‑formatter

I kept all three and wired them around MedGamma‑2B.

Model‑context scaffolding

Copilot generated:

  • Prompt templates
  • Role‑system messages
  • Safety guardrails

that were tailored to MedGamma‑2B’s capabilities.

Token‑aware logic

Copilot reminded me to:

  • Chunk user input
  • Trim old context
  • Stay under MedGamma‑2B’s context window

This is critical when you have no API retries and must avoid timeouts.

Testing scripts

Copilot wrote unit‑style tests that simulate patient‑style input and validate MedClinic’s JSON output shapes.

Where I pushed back

  • Copilot once suggested serializing the entire conversation into every call — a 10k‑token‑drag. I forced it to keep only the last 3 turns to stay under budget.
  • Early templates were too verbose; I cut about 40% of the prompt after reviewing Copilot’s own “better‑prompt” suggestions.

BEFORE VS AFTER

Aspect Before Copilot & MedGamma‑2B After Copilot‑Rewired MedClinic
Source code Single file, spaghetti inference Modular: voice → parser → inference → JSON formatter
Model usage Raw prompt, no context-window awareness Context-aware; trims history to stay under MedGamma‑2B’s token budget
Response format Free-text paragraph Structured JSON: diagnosis, symptoms, next_steps
Token pressure No control, often past window Token-sensitive trimming, pre-compressed chunks
UI feel 10s delays, no structure Fast, structured, feels like talking to a junior doctor

SOAP Note transcription

SOAP Note transcription

SOAP Note transcription

SOAP Note transcription

SOAP Note transcription

SOAP Note transcription

My Experience with GitHub Copilot

Ease

Copilot removed the design friction, not the code‑writing.

  • I keep writing HTML/CSS myself, just like the e‑commerce example from the challenge.
  • But whenever I touched MedGamma‑2B orchestration logic, Copilot sketched the architecture and I polished it.

Power amplified by tokens

MedGamma‑2B’s context window is the hard limit — no retries.

Copilot helped me design a pipeline that never spills tokens:

  • Automatically summarize long patient histories.
  • Drop irrelevant context before sending to the model.
  • Pre‑compress repeated info into short tags.

In practice:

  • A 2‑minute patient voice transcript → ~1.2k tokens sent to MedGamma‑2B.
  • Copilot‑generated logic trimmed ~400 useless tokens just by removing filler and rephrasing.

MedClinic stays under budget while giving answers that feel like a human‑style consultation, not a chat‑bot‑style dump.

Copilot as co‑founder

GitHub Copilot didn’t just speed up my development — it rewired MedClinic’s brain.

  • Before: a local‑model prototype that felt like a toy.
  • After: a token‑aware, structured, local‑only AI physician assistant that I can run on my laptop with zero cloud dependencies.

Top comments (0)