<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhoni Yedhuru</title>
    <description>The latest articles on DEV Community by Dhoni Yedhuru (@dhoni_yedhuru).</description>
    <link>https://dev.to/dhoni_yedhuru</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886885%2F82070d8f-cca2-4e9a-8226-50858340eb3a.png</url>
      <title>DEV Community: Dhoni Yedhuru</title>
      <link>https://dev.to/dhoni_yedhuru</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhoni_yedhuru"/>
    <language>en</language>
    <item>
      <title>Clinic-CoPilot</title>
      <dc:creator>Dhoni Yedhuru</dc:creator>
      <pubDate>Sun, 19 Apr 2026 04:35:55 +0000</pubDate>
      <link>https://dev.to/dhoni_yedhuru/clinic-copilot-58g5</link>
      <guid>https://dev.to/dhoni_yedhuru/clinic-copilot-58g5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw66fam4o9st3sx2g3ug.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw66fam4o9st3sx2g3ug.jpg" alt=" " width="800" height="422"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu462940ey5aym2um3jz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwu462940ey5aym2um3jz.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3fp9vhuhgirxqs3eg5xu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3fp9vhuhgirxqs3eg5xu.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;I built a clinical assistant that could summarize patient notes pretty well. Then I asked it about the same patient two weeks later, and it confidently ignored everything it had already seen.&lt;/p&gt;

&lt;p&gt;That’s when I realized: I hadn’t built a system. I’d built a stateless loop.&lt;/p&gt;

&lt;p&gt;What this system actually does&lt;/p&gt;

&lt;p&gt;Clinical CoPilot is a memory-first system for primary care workflows. It takes raw visit notes and turns them into structured, recallable memory that persists across time. The goal is simple: before a doctor sees a patient, the system should already know what matters.&lt;/p&gt;

&lt;p&gt;The architecture is straightforward:&lt;/p&gt;

&lt;p&gt;A FastAPI backend orchestrates the flow&lt;br&gt;
An extraction agent parses visit notes into structured memory&lt;br&gt;
A memory layer (Hindsight) stores those events&lt;br&gt;
A briefing agent generates pre-visit summaries&lt;br&gt;
A pattern agent identifies trends across patients&lt;/p&gt;

&lt;p&gt;Each patient has their own memory thread. Every visit adds new entries. Nothing gets overwritten.&lt;/p&gt;

&lt;p&gt;On the frontend, I expose three things:&lt;/p&gt;

&lt;p&gt;A patient dashboard&lt;br&gt;
A “Memory Inspector” to visualize stored memory&lt;br&gt;
A briefing panel that shows what the system thinks matters&lt;/p&gt;

&lt;p&gt;The UI is intentionally simple. Most of the work happens in how memory is structured and retrieved.&lt;/p&gt;

&lt;p&gt;The problem: stateless systems don’t handle time&lt;/p&gt;

&lt;p&gt;My first version followed the usual pattern:&lt;/p&gt;

&lt;p&gt;Grab recent notes&lt;br&gt;
Dump them into a prompt&lt;br&gt;
Generate output&lt;/p&gt;

&lt;p&gt;Something like:&lt;/p&gt;

&lt;p&gt;prompt = f"""&lt;br&gt;
Patient history:&lt;br&gt;
{recent_notes}&lt;/p&gt;

&lt;p&gt;Summarize key clinical details for the next visit.&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;This works if “recent” is enough. It usually isn’t.&lt;/p&gt;

&lt;p&gt;Real problems:&lt;/p&gt;

&lt;p&gt;Medication reactions from older visits disappear&lt;br&gt;
Follow-up commitments get lost&lt;br&gt;
Personal context (job, family, habits) never survives&lt;/p&gt;

&lt;p&gt;I kept increasing the context window. It didn’t help. The system wasn’t forgetting because of token limits. It was forgetting because it had no memory model.&lt;/p&gt;

&lt;p&gt;I needed a way to give my agent memory&lt;br&gt;
.&lt;/p&gt;

&lt;p&gt;Switching to memory: events instead of text&lt;/p&gt;

&lt;p&gt;Instead of storing notes, I started storing events.&lt;/p&gt;

&lt;p&gt;Each visit produces multiple structured memories:&lt;/p&gt;

&lt;p&gt;Visit summary&lt;br&gt;
Medication event&lt;br&gt;
Personal detail&lt;br&gt;
Commitment&lt;br&gt;
Lab result&lt;br&gt;
Family history&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;{&lt;br&gt;
  "type": "medication_event",&lt;br&gt;
  "content": "Patient reported mild nausea when taking Metformin on empty stomach",&lt;br&gt;
  "tags": ["metformin", "side-effects"],&lt;br&gt;
  "date": "2026-01-22"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;This immediately changes how the system behaves:&lt;/p&gt;

&lt;p&gt;You can query specific types&lt;br&gt;
You can prioritize certain events&lt;br&gt;
You can explain outputs&lt;/p&gt;

&lt;p&gt;The missing piece was persistence and retrieval.&lt;/p&gt;

&lt;p&gt;Using Hindsight as the memory layer&lt;/p&gt;

&lt;p&gt;I decided to try Hindsight&lt;br&gt;
 for agent memory. I also explored the Hindsight agent memory docs&lt;br&gt;
 to understand how threads and recall work.&lt;/p&gt;

&lt;p&gt;Each patient gets their own thread:&lt;/p&gt;

&lt;p&gt;thread_id = f"patient_{patient_id}"&lt;/p&gt;

&lt;p&gt;await hindsight.write_memory(&lt;br&gt;
    thread_id=thread_id,&lt;br&gt;
    memory={&lt;br&gt;
        "content": memory.content,&lt;br&gt;
        "tags": memory.tags,&lt;br&gt;
        "metadata": memory.metadata&lt;br&gt;
    }&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;Retrieval is semantic, but scoped:&lt;/p&gt;

&lt;p&gt;memories = await hindsight.recall_memories(&lt;br&gt;
    thread_id=thread_id,&lt;br&gt;
    query="diabetes medication side effects",&lt;br&gt;
    limit=10&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;Two things mattered here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Threads enforce isolation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each patient’s history lives in its own namespace. That prevents cross-contamination and keeps retrieval focused.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Recall is query-driven&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of passing everything, I can ask:&lt;/p&gt;

&lt;p&gt;“What medication issues exist?”&lt;br&gt;
“What commitments are pending?”&lt;/p&gt;

&lt;p&gt;This is much closer to how humans think.&lt;/p&gt;

&lt;p&gt;Extraction: where things got painful&lt;/p&gt;

&lt;p&gt;Memory is only useful if it’s accurate. The extraction step was the hardest part.&lt;/p&gt;

&lt;p&gt;I use an LLM (via Groq) to convert raw notes into structured memory:&lt;/p&gt;

&lt;p&gt;response = groq.chat.completions.create(&lt;br&gt;
    model="openai/gpt-oss-120b",&lt;br&gt;
    response_format={"type": "json_object"},&lt;br&gt;
    messages=[{&lt;br&gt;
        "role": "system",&lt;br&gt;
        "content": "Extract structured clinical memories from the note"&lt;br&gt;
    }],&lt;br&gt;
    temperature=0.2&lt;br&gt;
)&lt;/p&gt;

&lt;p&gt;Problems I ran into:&lt;/p&gt;

&lt;p&gt;The model would over-extract trivial details&lt;br&gt;
It would miss implicit commitments&lt;br&gt;
Tags were inconsistent&lt;/p&gt;

&lt;p&gt;Fixes:&lt;/p&gt;

&lt;p&gt;Strict schema with Pydantic&lt;br&gt;
Explicit examples in the prompt&lt;br&gt;
Post-processing to normalize tags&lt;/p&gt;

&lt;p&gt;Extraction is not “set and forget.”&lt;/p&gt;

&lt;p&gt;Generating briefings: only as good as memory&lt;/p&gt;

&lt;p&gt;Once memory is reliable, briefing becomes straightforward.&lt;/p&gt;

&lt;p&gt;The agent pulls:&lt;/p&gt;

&lt;p&gt;All patient memories&lt;br&gt;
Relevant cross-patient patterns&lt;br&gt;
memories = await hindsight.get_all_patient_memories(patient_id)&lt;br&gt;
patterns = await hindsight.recall_patterns(query="diabetes risk")&lt;/p&gt;

&lt;p&gt;Then generates a structured briefing.&lt;/p&gt;

&lt;p&gt;The key difference:&lt;/p&gt;

&lt;p&gt;I’m not asking the model to remember. I’m giving it memory.&lt;/p&gt;

&lt;p&gt;What the system actually produces&lt;br&gt;
Input note&lt;/p&gt;

&lt;p&gt;“Patient reports mild nausea when taking Metformin on empty stomach. Started new job as school librarian…”&lt;/p&gt;

&lt;p&gt;Stored memory&lt;br&gt;
Medication event → Metformin nausea&lt;br&gt;
Personal detail → new job&lt;br&gt;
Commitment → schedule eye exam&lt;br&gt;
Later: generated briefing&lt;/p&gt;

&lt;p&gt;Today’s Focus&lt;/p&gt;

&lt;p&gt;Review A1C improvement&lt;br&gt;
Confirm eye exam scheduling&lt;br&gt;
Assess impact of new job on diet&lt;/p&gt;

&lt;p&gt;Suggested opener&lt;/p&gt;

&lt;p&gt;“How’s the new librarian role going? Are you finding ways to pack healthy lunches?”&lt;/p&gt;

&lt;p&gt;This is where the system stops feeling generic.&lt;/p&gt;

&lt;p&gt;Patterns across patients&lt;/p&gt;

&lt;p&gt;I also maintain a global memory thread:&lt;/p&gt;

&lt;p&gt;thread_id = "patterns_global"&lt;/p&gt;

&lt;p&gt;The pattern agent identifies trends like:&lt;/p&gt;

&lt;p&gt;Diabetes + hypertension + family history → elevated stroke risk&lt;br&gt;
Frequent visits → better adherence&lt;br&gt;
Lessons learned&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Context is not memory&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Passing more tokens is not a substitute for real memory.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Structure beats embeddings alone&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without structure, you can’t prioritize or explain outputs.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extraction quality determines everything&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Bad input → bad system. Most improvements came from better schemas and prompts.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Traceability matters&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the system says something, you should know where it came from.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Memory needs fallback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;External systems fail. Always have a backup.&lt;/p&gt;

&lt;p&gt;Closing&lt;/p&gt;

&lt;p&gt;I was tired of prompt engineering and started looking for a better way to help my agent remember&lt;br&gt;
.&lt;/p&gt;

&lt;p&gt;Using Hindsight&lt;br&gt;
 forced me to think differently:&lt;/p&gt;

&lt;p&gt;Events instead of text&lt;br&gt;
Threads instead of prompts&lt;br&gt;
Recall instead of context stuffing&lt;/p&gt;

&lt;p&gt;Once I made that shift, the system stopped forgetting.&lt;/p&gt;

&lt;p&gt;And more importantly, it started behaving like it understood patients across time.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
