Daniel Yarmoluk

Posted on Mar 20

I compressed 2MB of healthcare data into 12KB of markdown — here's the knowledge graph -- first time to Dev.to in my life...

#ai #healthtech #graphdatabase

I built an interactive healthcare knowledge graph — conditions, medications, drug interactions, diagnostics, billing codes, care pathways — and structured it as a compressed markdown file that any AI model can reason over.

Not a summary. Not a document. A traversable knowledge graph in .md format.

~3,000 tokens instead of ~500,000. Same reasoning quality. 170x more efficient.

Here's the live interactive demo: graphifymd.com/healthcare-kg-demo.html

Why this matters

85% of enterprise AI pilots fail to scale. Not because the models are bad. Because the context is.

An LLM can't reason about drug interactions if it doesn't know that metformin relates to renal function relates to GFR thresholds relates to dosing adjustments. That's not a retrieval problem. That's a relationship problem.

RAG retrieves text chunks. Knowledge graphs traverse relationships. The difference is the difference between searching a library index and having a librarian who knows which books reference each other — and why.

The pipeline

Raw clinical data (~2MB)
    ↓
Knowledge graph extraction (200 entities, 500+ relationships)
    ↓
Graph distillation (typed relationships + traversal rules)
    ↓
Compressed .md (~12KB, ~3,000 tokens)
    ↓
Deploy anywhere

What the .md looks like

Here's a fragment of the cardiology domain graph compressed to markdown:

## Entities

### Conditions
- Atrial Fibrillation | ICD: I48 | prevalence: 2.7M US
- Heart Failure | ICD: I50 | prevalence: 6.2M US
  - subtypes: HFrEF (EF≤40%), HFpEF (EF≥50%)

### Medications
- Apixaban | class: DOAC | no INR monitoring
- Warfarin | class: anticoagulant | INR target: 2-3
- Amiodarone | class: antiarrhythmic | ⚠️ toxicity

## Relationships

AFib → TREATED_BY → Apixaban (first-line DOAC)
AFib → RISK_FACTOR_FOR → Stroke (5x risk)
HFrEF → TREATED_BY → Metoprolol (mortality ↓35%)
Warfarin → INTERACTS_WITH → Amiodarone ⚠️
  ↳ RULE: ↑INR 50-70%. Reduce warfarin dose 30-50%.
Apixaban → REQUIRES → CrCl assessment
  ↳ RULE: Reduce dose if CrCl 15-29, avoid if <15

## Traversal Examples

Q: Patient with AFib + CKD Stage 4. Anticoagulation?
AFib → TREATED_BY → Apixaban
Apixaban → REQUIRES → CrCl
CKD Stage 4 → CrCl 15-29 → DOSE_ADJUST Apixaban
→ Answer: Apixaban 2.5mg BID (reduced dose)

The model doesn't guess. It follows the chain. Multi-hop reasoning with an audit trail.

The numbers

Metric	Raw Data	Knowledge Graph .md
Size	~2MB	~12KB
Tokens	~500,000	~3,000
Density	1x	170x
Compression	—	93%
CO₂ per query	~0.34 kg	~0.002 kg

That last line matters. Fewer tokens = less compute = lower energy. 99.4% carbon reduction per query. Structured intelligence is greener intelligence.

March Madness knowledge graph — 68 teams, built live with graduate software engineers

It works everywhere

The same .md file works across every AI environment without modification:

Claude Projects — upload as project knowledge
Claude Code — CLAUDE.md project context
ChatGPT — custom GPT instructions
Cursor / Windsurf — context file
Codex CLI — AGENTS.md
MCP Server — serve as tool context
API — system prompt injection
Email — it's just text. Paste it.

No vendor lock-in. No format conversion. No special tooling. Markdown is the universal interface.

Why not just use RAG?

RAG retrieves the top-k text chunks that match your query. It's single-hop — find the most similar text, return it.

A knowledge graph traverses relationships. When you ask about a patient with AFib and kidney disease, the graph follows:

AFib → treatment options → Apixaban → renal requirements →
CrCl thresholds → CKD staging → dose adjustment rules

That's 5 hops. RAG would need to independently retrieve and stitch together 5 separate chunks and hope the model connects them. The graph has already connected them.

Microsoft's 2024 research showed knowledge graphs achieve an 83% win rate vs vector RAG. HopRAG (ACL 2025) showed 77% higher accuracy on multi-hop questions.

What I'm building

I run Graphify.md — we build domain knowledge graphs and compress them to portable .md for any industry. Healthcare is one vertical. We've also built graphs for:

March Madness tournament — 68 teams, real-time scores, built live with grad students
LinkedIn Groups ecosystem — 200+ groups, 15 verticals, relationship edges
Defense, legal, construction, supply chain, GovTech, education — 12 verticals mapped

The methodology works on any domain. If your data has entities and relationships — and all data does — it can be graphed, compressed, and deployed.

Try it

The interactive demo is live. Hover over nodes to see relationship chains light up:

👉 graphifymd.com/healthcare-kg-demo.html

Built entirely with Claude Code. The whole thing — knowledge graph extraction, D3 visualization, .md compression, the site — solo, in days not months.

If you're working on a domain where AI keeps hallucinating or RAG keeps missing context, the problem might not be the model. It might be the structure.

Daniel Yarmoluk — Graphify.md — Book a call

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.