Skip to content

DEV Community

armando salles

Posted on May 19

Building a Multi-Agent Clinical Research System with LangGraph, DeepSeek and Tavily

#landgraph #python #agents #ai

Building a Multi-Agent Clinical Research System with LangGraph, DeepSeek and Tavily

Imagine asking a complex medical question and having a network of AI agents research the web, critique the quality of their own findings, and only write the final report after you personally approve the evidence. That is exactly what I built.

In this article I'll walk through the architecture, the key decisions, and the code that makes it work.

The Problem

Clinicians, researchers, and health operators spend hours reviewing scientific literature before making decisions. The question was: can a multi-agent AI system do this reliably, with quality control built in?

The Architecture

The system runs a LangGraph StateGraph with 4 specialized agents:

Orchestrator — initializes state and coordinates the flow
Researcher — searches the web via Tavily and PubMed (up to 3 rounds)
Critic — evaluates data quality and loops back if insufficient
Writer — generates the final structured clinical report

Why LangGraph over CrewAI?

LangGraph gives explicit control over graph edges, state transitions, and interrupt points. In a clinical context, predictability matters more than convenience. CrewAI abstracts too much — LangGraph lets you see exactly what runs, when, and why.

The State

The most important design decision in the state is the operator.add annotation on research_data. LangGraph overwrites state fields by default. Without this, each research round would erase the previous one. The annotation tells LangGraph to append instead — building cumulative context across revision cycles.

from typing import TypedDict, List, Dict, Any, Annotated
import operator

class AgentState(TypedDict):
    query: str
    research_data: Annotated[List[Dict[str, Any]], operator.add]
    critic_feedback: str
    revision_count: int
    final_report: str
    is_sufficient: bool

The Critic Agent — The Key Design Decision

The Critic is what separates this system from a simple RAG pipeline. It receives the research data and asks: Is the evidence recent enough? Are there contradictions between sources? Is the population studied the right one?

async def critic_node(state: AgentState):
    prompt = f"""You are a Medical Critic. 
    Evaluate if this research is sufficient for the query.
    Query: {state['query']}
    Research Data: {state['research_data']}
    Respond ONLY with valid JSON:
    {{"is_sufficient": true, "feedback": "your feedback"}}"""

    res = llm_deepseek.invoke([HumanMessage(content=prompt)])
    parsed = json.loads(res.content.strip())

    return {
        "is_sufficient": parsed.get("is_sufficient", False),
        "critic_feedback": parsed.get("feedback", "")
    }

If the Critic returns is_sufficient: false, the graph sends the Researcher back for another round — up to 3 times maximum to prevent infinite loops.

Human-in-the-Loop

This is the most critical safety feature. Before the Writer generates the final report, the graph pauses and presents the collected evidence to the user.

clinical_app = workflow.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["writer"]
)

state = await clinical_app.aget_state(config)
ans = input("Approve research and generate report? (y/n): ")

if ans == 'y':
    async for event in clinical_app.astream(None, config=config):
        if "writer" in event:
            report = event["writer"]["final_report"]

When the user types y, the graph resumes from the checkpoint using MemorySaver — no data is lost between the pause and the resume.

Real Test Query

I ran the system with this query:

"Latest evidence on semaglutide for obesity treatment in CKD patients?"

The Critic agent referenced the 2024 FLOW trial in its evaluation — the AI didn't just search, it questioned the quality and specificity of what it found. The final report is available in the repository as clinical_report.md.

Terminal Output

🎯 Orchestrator: Iniciando pipeline...
🔬 Researcher: Rodada 1 de pesquisa...
   🔧 Usando ferramenta: tavily_search
   🔧 Usando ferramenta: pubmed_mock_search
🧐 Critic: Avaliando qualidade dos dados...
   ✅ Suficiente: True
--- Executing: __interrupt__ ---

Aprovar pesquisa e gerar relatório? (y/n): y
✍️  Writer: Gerando relatório clínico...
✅ Relatório gerado: clinical_report.md

Stack

LangGraph — stateful multi-agent orchestration
LangChain — LLM abstractions and tool calling
DeepSeek — cost-efficient OpenAI-compatible LLM
Tavily — AI-optimized real web search
Pydantic — structured outputs
Python 3.14

Full Repository

Complete code, architecture decisions, and the AI-generated clinical report example:

Armandogith / langgraph-research-orchestrator

🧬 LangGraph Clinical Research Orchestrator

Multi-Agent AI system for clinical evidence surveillance.
Built with LangGraph, DeepSeek and Tavily.

How it works

You ask a complex medical question. A network of AI agents researches, critiques and only writes the report when the data quality is approved — by both the Critic agent and a human reviewer.

Agent Flow

graph TD
    A([START]) --> B[🎯 Orchestrator]
    B --> C[🔬 Researcher\nTavily + PubMed]
    C --> D[🧐 Critic\nEvaluates quality]
    D -->|Not sufficient| C
    D -->|Approved| E{👤 Human Review\nHITL}
    E -->|y| F[✍️ Writer\nMarkdown Report]
    F --> G([END])

Agents

Agent	Role
Orchestrator	Initializes state and coordinates the flow
Researcher	Searches web via Tavily + PubMed mock
Critic	Evaluates data quality — loops back if insufficient
Writer	Generates final structured clinical report

Key Technical Decisions

LangGraph over CrewAI — explicit control over edges, state and interrupts
operator.add on research_data — append-only accumulation across revisions
interrupt_before=["writer"] — human…

What's Next

Real PubMed API integration (replace mock)
FastAPI endpoint to serve queries via HTTP
Weekly monitoring mode with alerts on new studies
Multi-condition support — configurable per user
Output variants — technical for doctors, simplified for patients

If you're building with LangGraph or multi-agent systems, I'd love to connect. Drop a comment below or find me on LinkedIn.

Top comments (0)

Subscribe