Rishi Kumar

Posted on May 12

Build a Medical Chart Coding Pipeline with Daimon, Claude, and Neo4j

#ai #python #tutorial #go

Adding an LLM to your application usually means writing the same infrastructure over and over: define JSON schemas for each tool, dispatch tool calls, drive the agentic loop, wire up a vector store, manage embedding calls, handle session state. Before you know it the actual feature is buried under plumbing.

Daimon is a Go sidecar that takes a different approach. Drop the binary next to your app, write a YAML config, and you get a fully operational LLM endpoint — with vector search, graph queries, session memory, and a complete agentic loop — without writing any of that wiring yourself.

The key idea: when you declare a vector store or graph database in the config, Daimon auto-generates LLM tools for it ({name}_search, {name}_cypher, etc.) and injects them into every LLM call. The model can use them immediately. You named the component; you got the tools.

To show what this looks like end-to-end, we'll build a medical chart ICD-10 coding pipeline. Given a medical transcription, it will:

Semantically search a vector store of 46k ICD-10 codes for candidates
Verify each candidate against a Neo4j taxonomy graph
Return only confirmed codes, with confidence scores grounded in the graph — not the model

The entire LLM + tools setup fits in one YAML file. The Python client is about 60 lines. Let's build it.

What you'll need:

Docker
Python 3.11+
An Anthropic API key (or swap to a fully local model — covered at the end)
The Daimon binary

Step 1: Clone the Repo

git clone https://github.com/sonicboom15/medchart
cd medchart

The repo contains everything: the Daimon config, the ICD-10 loader, and the coding pipeline. Here's the layout:

daimon-config/config.yaml   ← the entire LLM + tools setup
docker-compose.yml          ← Neo4j, Qdrant, Ollama
icd10/loader.py             ← CMS XML parser → Qdrant + Neo4j
pipeline/coder.py           ← Daimon client → CodeAssignment[]
pipeline/ingest.py          ← batch processor for MTSamples CSV

Step 2: Install Daimon

Pick the method for your platform:

macOS / Linux — Homebrew

brew tap sonicboom15/tap
brew install daimon

Windows — winget

winget install sonicboom15.daimon

Windows — Scoop

scoop bucket add sonicboom15 https://github.com/sonicboom15/scoop-bucket
scoop install daimon

Debian / Ubuntu

# Download the .deb from https://github.com/sonicboom15/daimon/releases/latest
sudo dpkg -i daimon_*_linux_amd64.deb

RHEL / Fedora

# Download the .rpm from https://github.com/sonicboom15/daimon/releases/latest
sudo rpm -i daimon_*_linux_amd64.rpm

Build from source (requires Go 1.23+)

git clone https://github.com/sonicboom15/daimon.git
cd daimon && make build
# binary at ./bin/daimon

Verify the install:

daimon --version

Step 3: Start the Infrastructure

docker compose up -d

This starts:

Qdrant on port 6333 — vector store for ICD-10 code embeddings
Neo4j on ports 7474 (browser) and 7687 (Bolt) — taxonomy graph
Ollama on port 11434 — local embedding model

Remove the deploy: GPU block from docker-compose.yml if you don't have an NVIDIA GPU.

Pull the embedding model into Ollama:

docker exec -it medchart-ollama-1 ollama pull nomic-embed-text

This is the only model Ollama needs. The LLM itself uses the Anthropic API.

Step 4: Understand the Config

Open daimon-config/config.yaml. This single file is the entire pipeline setup:

components:

  # Local embedder via Ollama (OpenAI-compatible endpoint)
  - name: embedder
    type: embedding/openai
    metadata:
      base_url: http://localhost:11434/v1
      api_key: local
      model: nomic-embed-text
      dimensions: "768"

  # 46k ICD-10 code descriptions in Qdrant
  - name: icd10
    type: qdrant
    metadata:
      base_url: http://localhost:6333
      collection: icd10
      create_if_missing: "true"
      embedder: embedder
      dimensions: "768"

  # ICD-10 taxonomy hierarchy in Neo4j
  - name: icd10-graph
    type: neo4j
    metadata:
      bolt_url: bolt://localhost:7687
      username: neo4j
      password: medchart

  # Claude Sonnet — api_key read from ANTHROPIC_API_KEY env var
  - name: coder
    type: anthropic
    metadata:
      default_model: claude-sonnet-4-6
    defaults:
      temperature: 0.1
      max_tokens: 1024
      system: |
        You are a clinical coding assistant. Given a medical chart excerpt, identify
        and assign the correct ICD-10-CM codes using this exact two-phase workflow:

        PHASE 1 — Search:
        Call icd10_search for each distinct clinical concept in the note.
        Issue all searches in a single parallel batch.

        PHASE 2 — Verify:
        For each candidate code, call icd10_graph_cypher:
          MATCH (n:Code {code: $code}) RETURN n.code, n.short_desc
        Issue all verifications in a single parallel batch.
        Only include codes where this query returns a result.

        Return ONLY a JSON array — no prose, no markdown:
          [{"code": "J30.1", "description": "Allergic rhinitis due to pollen"}]

A few things worth understanding here:

Wiring order matters. Daimon resolves components top to bottom: embedders first, then vector stores (which can reference an embedder by name via embedder: embedder), then graph stores, then LLMs.

Auto-generated tools. Because you declared icd10 (Qdrant) and icd10-graph (Neo4j), the coder LLM automatically gets these tools on every call — no code required:

icd10_search — embed a query and search Qdrant
icd10_upsert — insert or update a code
icd10-graph_cypher — run any Cypher query against Neo4j

The system prompt references icd10_search and icd10_graph_cypher by name. That's all the wiring there is.

Batched tool calls. The system prompt tells the model to issue all searches in one batch and all verifications in another. Daimon executes concurrent tool calls in parallel, so two waves of 8 calls runs as fast as two single calls.

Step 5: Start Daimon

export ANTHROPIC_API_KEY=sk-ant-...
daimon serve --config daimon-config/config.yaml

You should see all four components register:

INFO registered embedder       name=embedder    type=embedding/openai
INFO registered vector store   name=icd10       type=qdrant
INFO registered graph store    name=icd10-graph type=neo4j
INFO registered LLM component  name=coder       type=anthropic
INFO daimon listening          addr=127.0.0.1:3500

Step 6: Load the ICD-10 Data

Download the CMS 2025 ICD-10-CM tabular XML:

https://www.cms.gov/files/zip/2025-code-tables-tabular-and-index.zip

Extract icd10cm_tabular_2025.xml into the data/ folder. Then install dependencies and run the loader:

pip install -e ".[dev]"
python -m icd10.loader data/icd10cm_tabular_2025.xml

The loader does three things:

Parses the XML. The CMS file is a nested hierarchy of chapters → sections → diagnosis codes → subcodes. The parser walks this tree recursively and produces a flat list of ICD10Code objects.
Loads into Qdrant. Each code is upserted via Daimon's /v1/memory/icd10 endpoint with the full description as content and the code, category, block, and chapter as metadata. IDs are deterministic uuid5 hashes of the code string, so re-runs are idempotent.
Loads into Neo4j. The taxonomy becomes a graph via Daimon's /v1/graph/icd10-graph/cypher endpoint:

Chapter -[:PART_OF]-> Block -[:PART_OF]-> Category -[:IS_A]-> Code

All codes get a :Code label; 3-character codes (like J30, E66) also get :Category. Everything is batched into groups of 500 using Cypher UNWIND to keep it fast and memory-efficient.

This takes about 10 minutes. When it finishes, verify the counts:

# Vector store: should show ~46k points
curl http://localhost:6333/collections/icd10 | python -m json.tool

Open the Neo4j browser at http://localhost:7474 (user: neo4j, password: medchart) and run:

MATCH (n:Code) RETURN count(n)

Step 7: Run the Coder

Download mtsamples.csv from Kaggle and save it to data/. Then:

python -m pipeline.ingest --csv data/mtsamples.csv --out data/coded.jsonl --limit 10

You'll see the agentic tool calls stream in real time as Daimon drives the loop:

Allergic Rhinitis
  → icd10_search allergic rhinitis seasonal
  → icd10_search asthma mild intermittent
  → icd10_graph_cypher J30.9
  → icd10_graph_cypher J45.20
  ✓ J30.9  Allergic rhinitis, unspecified          (100%)
  ✓ J45.20 Mild intermittent asthma, uncomplicated (100%)

Laparoscopic Gastric Bypass Consult
  → icd10_search morbid obesity BMI
  → icd10_search gastroesophageal reflux
  → icd10_search knee pain bilateral
  → icd10_search low back pain
  → icd10_search nicotine dependence cigarettes
  → icd10_search allergy penicillin
  → icd10_graph_cypher E66.01
  → icd10_graph_cypher K21.9
  → icd10_graph_cypher M25.561
  → icd10_graph_cypher M25.562
  → icd10_graph_cypher M54.50
  → icd10_graph_cypher F17.210
  → icd10_graph_cypher Z88.0
  ✓ E66.01  Morbid (severe) obesity due to excess calories    (100%)
  ✓ K21.9   Gastro-esophageal reflux disease w/o esophagitis (100%)
  ✓ M25.561 Pain in right knee                               (100%)
  ✓ M25.562 Pain in left knee                                (100%)
  ✓ M54.50  Low back pain, unspecified                       (100%)
  ✓ F17.210 Nicotine dependence, cigarettes, uncomplicated   (100%)
  ✓ Z88.0   Allergy status to penicillin                     (100%)

Each record in coded.jsonl looks like this:

{
  "sample_name": "Laparoscopic Gastric Bypass Consult",
  "specialty": "Bariatrics",
  "codes": [
    {"code": "E66.01", "description": "Morbid (severe) obesity due to excess calories", "confidence": 1.0, "verified": true},
    {"code": "K21.9",  "description": "Gastro-esophageal reflux disease without esophagitis", "confidence": 1.0, "verified": true},
    {"code": "M25.561","description": "Pain in right knee", "confidence": 1.0, "verified": true},
    {"code": "M25.562","description": "Pain in left knee", "confidence": 1.0, "verified": true},
    {"code": "M54.50", "description": "Low back pain, unspecified", "confidence": 1.0, "verified": true},
    {"code": "F17.210","description": "Nicotine dependence, cigarettes, uncomplicated", "confidence": 1.0, "verified": true},
    {"code": "Z88.0",  "description": "Allergy status to penicillin", "confidence": 1.0, "verified": true}
  ]
}

Notice M25.561 and M25.562 — bilateral knee pain correctly split into separate laterality codes. That's proper ICD-10 specificity, and the model inferred it from the clinical language without any special instruction.

The confidence scores come from the graph verification step in pipeline/coder.py, not from the model's own calibration:

1.0 — exact code confirmed in Neo4j. Use it.
0.5 — category exists but specific subcode does not. Flag for human review.
0.0 — not found at all. Hallucinated. Discard.

Bonus: Run Fully Local (No API Key)

Swap two lines in daimon-config/config.yaml:

  - name: coder
    type: llamacpp        # was: anthropic
    metadata:
      base_url: http://localhost:11434/v1
      api_key: local
      default_model: qwen2.5:7b   # was: default_model: claude-sonnet-4-6

Pull the model:

docker exec -it medchart-ollama-1 ollama pull qwen2.5:7b

Restart Daimon. Everything else — the Qdrant store, the Neo4j graph, the Python scripts — stays identical. qwen2.5:7b is accurate enough for most records during development. qwen2.5:14b gets you meaningfully closer to Claude quality if you have the VRAM.

What's Actually Happening

It's worth stepping back to see what Daimon is doing behind the scenes, because the Python client code is surprisingly short:

with daimon.Client(base_url="http://localhost:3500") as client:
    for chunk in client.llm("coder").converse(
        messages=[{"role": "user", "content": chart_text}]
    ):
        if chunk.type == "text":
            text_chunks.append(chunk.text)

That single converse call triggers a full agentic workflow inside Daimon:

Daimon calls Claude with the chart text and the auto-generated tool definitions for icd10_search and icd10_graph_cypher
Claude calls icd10_search for each clinical concept — Daimon embeds the query via Ollama and searches Qdrant, all in parallel
Claude calls icd10_graph_cypher to verify each candidate code — Daimon runs the Cypher queries against Neo4j, again in parallel
Claude returns the final JSON array
The converse generator yields text chunks back to your code

Your code never touched an embedding client, a Qdrant client, a Neo4j session, or an agentic loop. The YAML config wired all of it.

Where To Go From Here

The full source is at github.com/sonicboom15/medchart.
Daimon itself is at github.com/sonicboom15/daimon — it supports OpenAI, Anthropic, and any Ollama-compatible model, with Qdrant, Chroma, pgvector, Redis, Neo4j, and Memgraph as backends.

The pattern here — vector search + graph verification + agentic LLM loop — applies to a lot of domains beyond medical coding. If you're building something similar and find yourself writing the same infrastructure glue, Daimon is worth a look.

DEV Community