Devanshu Biswas

Posted on May 21

I Built a 3-Agent AI Research Crew in 250 Lines of Python (LangGraph + Free Gemini)

#ai #python #langgraph #beginners

You've seen the demos. "Look, our AI hired a research team and wrote you a 50-page report while you brushed your teeth!" Cool. Now show me the code.

It turns out the code is small. Embarrassingly small. The whole pattern — the thing every multi-agent framework on the market sells you a SaaS license for — is three functions piped through a typed dict. Once you see it, you can't unsee it.

So today we're going to build it. From scratch. In about 250 lines of Python. The crew has three specialists:

A Researcher who actually hits the web for real facts.
A Writer who turns those facts into a draft briefing.
An Editor who polishes the draft into something you'd send to a colleague.

You give it a topic. It hands you back a markdown report. And — this is the part the YouTubers skip — the entire stack is free. No OpenAI, no Anthropic, no Tavily, no credit card. Just one free Google API key.

Live demo — click a starter, watch the three stages turn green. Source on GitHub, step-by-step commits.

The lesson under the hype

If you remember nothing else, remember this: a multi-agent system is a state machine where each transition is an LLM call.

That's it. That's the whole field.

You define a shared blob of state — let's call it CrewState. The graph has nodes. Each node is just a function that reads the state, does some work (often by calling an LLM), and returns a partial dict. The framework merges that dict back into the state and decides which node runs next.

       ┌────────────┐    ┌──────────┐    ┌────────┐
TOPIC ─▶│ Researcher │───▶│  Writer  │───▶│ Editor │───▶ REPORT
       │  (DDG API) │    │ (Gemini) │    │(Gemini)│
       └────────────┘    └──────────┘    └────────┘
                  shared CrewState flows through all three

When LangGraph's authors built v0.1, they were essentially asking: what if we took LangChain's "Chain" abstraction but let it have cycles, branches, and persistent state? That's the whole pitch. The graph is the agent.

The state object: where everyone meets

LangGraph is opinionated about one thing: every node has to agree on the shape of the data flowing through. So we type it. Once.

from typing import TypedDict

class Fact(TypedDict):
    source: str
    title: str
    snippet: str

class CrewState(TypedDict, total=False):
    topic: str
    facts: list[Fact]
    draft: str
    final_report: str
    error: str

total=False is the magic. It means every field is optional. The graph starts with only topic set. As each node finishes, it adds its key (facts, then draft, then final_report). LangGraph merges the partial dict back into the state before the next node runs.

Notice what's missing: any reference to LLMs, prompts, or agents. The state is data, not behaviour. Keeping it that way is what makes the whole pattern testable — you can swap any agent for a stub that just returns {"draft": "lorem ipsum"} and the graph still runs.

Agent 1: The Researcher (or, why your LLM needs to leave the house)

The single biggest mistake beginners make with LLM apps is asking the model to remember things. Don't. The model's job is to transform inputs into outputs. The inputs come from your code, calling real APIs.

That's why the Researcher does the search first, then hands the results to the LLM. Not the other way around.

from duckduckgo_search import DDGS

def researcher_node(state: CrewState) -> dict:
    topic = state.get("topic", "").strip()
    with DDGS() as ddgs:
        raw = list(ddgs.text(topic, max_results=5, region="wt-wt"))
    facts = [
        {"source": r.get("href", ""), "title": r.get("title", ""), "snippet": r.get("body", "")}
        for r in raw
    ]
    return {"facts": facts}

That's the whole researcher. Twelve lines. No LLM call. DuckDuckGo's API is free, no key, no signup — perfect for a tutorial. (For production, swap to Tavily or Serper; the pattern doesn't change.)

Why grounded search instead of asking Gemini "what do you know about X?"

Because Gemini will lie about URLs. Every single LLM ever shipped will invent URLs that look real but 404. Don't ask the model for facts. Ask it to summarise facts you got from a tool.

Agents 2 + 3: Why two? Why not one big prompt?

Here's the part that feels wasteful at first. We have research notes. We want a polished report. Surely one prompt — "Write a polished briefing about $TOPIC using these facts" — can do it?

Try it. It produces mediocre output. Always.

The reason is divided attention. Asking a model to simultaneously generate prose AND fix its own structural problems means it does both badly. So we split:

The Writer has one job: produce a 400-500 word draft. Hedges allowed. Markdown sloppy. Just get words on the page.
The Editor has one job: read the draft, fix the structure, prepend a title, cut the flab, ship the final.

This is how human newsrooms work. Reporters file rough drafts; copy editors polish them. Both roles exist because both add value. Multi-agent design is just the same idea applied to LLM calls.

def writer_node(state):
    facts_block = "\n".join(f"- {f['title']}: {f['snippet']} ({f['source']})" for f in state["facts"])
    response = llm.invoke([
        SystemMessage(content="You are a technical writer. 400-500 words, plain markdown..."),
        HumanMessage(content=f"Topic: {state['topic']}\n\nFacts:\n{facts_block}\n\nWrite the draft."),
    ])
    return {"draft": response.content.strip()}

def editor_node(state):
    response = llm.invoke([
        SystemMessage(content="You are a senior editor. Polish, don't rewrite. Add an H1 title..."),
        HumanMessage(content=f"DRAFT TO POLISH:\n\n{state['draft']}"),
    ])
    return {"final_report": response.content.strip()}

System message = how to behave. Human message = what to do. Mixing them produces worse output. This is the single most underrated trick in prompt engineering.

Wiring it: the actual LangGraph part

After all that setup, the LangGraph code is comically short:

from langgraph.graph import START, END, StateGraph

g = StateGraph(CrewState)
g.add_node("researcher", researcher_node)
g.add_node("writer", writer_node)
g.add_node("editor", editor_node)
g.add_edge(START, "researcher")
g.add_edge("researcher", "writer")
g.add_edge("writer", "editor")
g.add_edge("editor", END)

crew = g.compile()
result = crew.invoke({"topic": "How CRISPR works"})
print(result["final_report"])

That's the whole thing. Eight lines define the agent crew. compile() turns the declarative graph into an executor that runs your nodes in order, merging state between them. Once compiled, the same crew object handles every request — it's stateless across invocations, because the state lives in the dict you pass in.

Streaming: because waiting 8 seconds for a spinner feels broken

The sync version works, but the UX is sad. The user clicks a button. A spinner appears. Eight to ten seconds pass. The whole report appears at once.

LangGraph has a stream() method that yields after each node finishes. Same total latency, dramatically better feel:

for update in crew.stream({"topic": topic}, stream_mode="updates"):
    for node_name, partial in update.items():
        yield {"node": node_name, "data": partial}

stream_mode="updates" emits only the diff returned by the node that ran (cheaper than "values", which emits the whole state). Pipe that through FastAPI's Server-Sent Events and the React frontend can render researcher facts as soon as they arrive — while the writer is still drafting. The Render free tier sleeps after 15 minutes, so first hit takes ~30 seconds to wake up; subsequent hits feel instant.

Why this is THE concept to understand before everything else

Every "wow look at this" AI demo from the next six months is going to be a variation on this exact pattern. Replace the Researcher with a code-aware agent and you get Cursor's compose mode. Replace the Editor with a fact-checker and you get Perplexity's pipeline. Replace the linear edges with conditional edges and a feedback loop and you get autonomous agent loops like AutoGPT.

The pattern doesn't change. State + nodes + edges. Three functions, three arrows, one dict. Everything else is marketing.

Once you see it, you can read any "agentic" framework's docs in five minutes flat. CrewAI, AutoGen, LangGraph, Pydantic AI — they all collapse to the same shape. The differences are syntactic sugar around the same core idea.

What to try next

Clone the repo and run it locally. It takes about three commands. Then break it:

Add a fourth agent — a fact-checker that flags suspicious claims in the editor's output and sends it back for revision. (Hint: add_conditional_edges.)
Swap Gemini for local Llama via Ollama. Same .invoke() interface, zero network cost.
Replace DuckDuckGo with a vector search over your own documents. Now you've built RAG.

The same 250 lines, with three different tweaks, give you three completely different products. That is why understanding multi-agent orchestration matters. The hard part isn't the agents — it's seeing the pattern clearly enough to recognise it everywhere.

This is Day 37 of 50 days of from-zero builds. A new technology every day. The full live demo is at langgraph-from-zero.vercel.app; the source is on GitHub. Beginners welcome — every commit teaches one concept.

Top comments (1)

Varsha Ojha • May 21

Nice breakdown. The “state + nodes + edges” framing makes multi agent systems feel much less mysterious. The part I liked most is separating the researcher, writer, and editor roles. A lot of people try to force one prompt to do everything, then wonder why the output feels average. The hard part is not always adding more agents. It’s making sure each agent has a clear job and clean state to work with.