DEV Community

armando salles
armando salles

Posted on

Building a Multi-Agent Clinical Research System with LangGraph, DeepSeek and Tavily

Building a Multi-Agent Clinical Research System with LangGraph, DeepSeek and Tavily

Imagine asking a complex medical question and having a network of AI agents research the web, critique the quality of their own findings, and only write the final report after you personally approve the evidence. That is exactly what I built.

In this article I'll walk through the architecture, the key decisions, and the code that makes it work.

The Problem

Clinicians, researchers, and health operators spend hours reviewing scientific literature before making decisions. The question was: can a multi-agent AI system do this reliably, with quality control built in?

The Architecture

The system runs a LangGraph StateGraph with 4 specialized agents:

  • Orchestrator — initializes state and coordinates the flow
  • Researcher — searches the web via Tavily and PubMed (up to 3 rounds)
  • Critic — evaluates data quality and loops back if insufficient
  • Writer — generates the final structured clinical report

Why LangGraph over CrewAI?

LangGraph gives explicit control over graph edges, state transitions, and interrupt points. In a clinical context, predictability matters more than convenience. CrewAI abstracts too much — LangGraph lets you see exactly what runs, when, and why.

The State

The most important design decision in the state is the operator.add annotation on research_data. LangGraph overwrites state fields by default. Without this, each research round would erase the previous one. The annotation tells LangGraph to append instead — building cumulative context across revision cycles.

from typing import TypedDict, List, Dict, Any, Annotated
import operator

class AgentState(TypedDict):
    query: str
    research_data: Annotated[List[Dict[str, Any]], operator.add]
    critic_feedback: str
    revision_count: int
    final_report: str
    is_sufficient: bool
Enter fullscreen mode Exit fullscreen mode

The Critic Agent — The Key Design Decision

The Critic is what separates this system from a simple RAG pipeline. It receives the research data and asks: Is the evidence recent enough? Are there contradictions between sources? Is the population studied the right one?

async def critic_node(state: AgentState):
    prompt = f"""You are a Medical Critic. 
    Evaluate if this research is sufficient for the query.
    Query: {state['query']}
    Research Data: {state['research_data']}
    Respond ONLY with valid JSON:
    {{"is_sufficient": true, "feedback": "your feedback"}}"""

    res = llm_deepseek.invoke([HumanMessage(content=prompt)])
    parsed = json.loads(res.content.strip())

    return {
        "is_sufficient": parsed.get("is_sufficient", False),
        "critic_feedback": parsed.get("feedback", "")
    }
Enter fullscreen mode Exit fullscreen mode

If the Critic returns is_sufficient: false, the graph sends the Researcher back for another round — up to 3 times maximum to prevent infinite loops.

Human-in-the-Loop

This is the most critical safety feature. Before the Writer generates the final report, the graph pauses and presents the collected evidence to the user.

clinical_app = workflow.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["writer"]
)

state = await clinical_app.aget_state(config)
ans = input("Approve research and generate report? (y/n): ")

if ans == 'y':
    async for event in clinical_app.astream(None, config=config):
        if "writer" in event:
            report = event["writer"]["final_report"]
Enter fullscreen mode Exit fullscreen mode

When the user types y, the graph resumes from the checkpoint using MemorySaver — no data is lost between the pause and the resume.

Real Test Query

I ran the system with this query:

"Latest evidence on semaglutide for obesity treatment in CKD patients?"

The Critic agent referenced the 2024 FLOW trial in its evaluation — the AI didn't just search, it questioned the quality and specificity of what it found. The final report is available in the repository as clinical_report.md.

Terminal Output

🎯 Orchestrator: Iniciando pipeline...
🔬 Researcher: Rodada 1 de pesquisa...
   🔧 Usando ferramenta: tavily_search
   🔧 Usando ferramenta: pubmed_mock_search
🧐 Critic: Avaliando qualidade dos dados...
   ✅ Suficiente: True
--- Executing: __interrupt__ ---

Aprovar pesquisa e gerar relatório? (y/n): y
✍️  Writer: Gerando relatório clínico...
✅ Relatório gerado: clinical_report.md
Enter fullscreen mode Exit fullscreen mode

Stack

  • LangGraph — stateful multi-agent orchestration
  • LangChain — LLM abstractions and tool calling
  • DeepSeek — cost-efficient OpenAI-compatible LLM
  • Tavily — AI-optimized real web search
  • Pydantic — structured outputs
  • Python 3.14

Full Repository

Complete code, architecture decisions, and the AI-generated clinical report example:

🧬 LangGraph Clinical Research Orchestrator

Multi-Agent AI system for clinical evidence surveillance.
Built with LangGraph, DeepSeek and Tavily.

How it works

You ask a complex medical question. A network of AI agents researches, critiques and only writes the report when the data quality is approved — by both the Critic agent and a human reviewer.

Agent Flow

graph TD
    A([START]) --> B[🎯 Orchestrator]
    B --> C[🔬 Researcher\nTavily + PubMed]
    C --> D[🧐 Critic\nEvaluates quality]
    D -->|Not sufficient| C
    D -->|Approved| E{👤 Human Review\nHITL}
    E -->|y| F[✍️ Writer\nMarkdown Report]
    F --> G([END])
Loading

Agents

Agent Role
Orchestrator Initializes state and coordinates the flow
Researcher Searches web via Tavily + PubMed mock
Critic Evaluates data quality — loops back if insufficient
Writer Generates final structured clinical report

Key Technical Decisions

  • LangGraph over CrewAI — explicit control over edges, state and interrupts
  • operator.add on research_data — append-only accumulation across revisions
  • interrupt_before=["writer"] — human…

What's Next

  • Real PubMed API integration (replace mock)
  • FastAPI endpoint to serve queries via HTTP
  • Weekly monitoring mode with alerts on new studies
  • Multi-condition support — configurable per user
  • Output variants — technical for doctors, simplified for patients

If you're building with LangGraph or multi-agent systems, I'd love to connect. Drop a comment below or find me on LinkedIn.

Top comments (0)