Meir

Posted on Jun 12

EasyDocs: Building an AI-Powered API Documentation Agent with LangGraph and Bright Data

How I built a production-ready system that transforms natural language queries into copy-paste API documentation

The Problem Every Developer Faces

We've all been there. You're deep in a coding session, need to integrate with an API, and suddenly you're drowning in scattered documentation, outdated Stack Overflow posts, and incomplete code examples. What if you could just ask in plain English: "How do I send a POST request to Stripe's payment API?" and get back clean, production-ready code with authentication, error handling, and everything you need?

That's exactly what I built with EasyDocs – an AI-powered documentation agent that combines LangGraph's workflow orchestration with Bright Data's enterprise web scraping to deliver instant, reliable API documentation.

The system uses a 4-node LangGraph workflow where each step has a specific responsibility:

Analyze Query: Uses Google Gemini to identify platform and operation type
Generate Plan: Creates actionable browser automation steps
Execute Browser: Leverages Bright Data's MCP for robust web scraping
Generate Response: Formats everything into developer-friendly documentation

Demo

The Code: LangGraph State Management

First, let's look at the state structure that flows through our workflow:

from typing import TypedDict, List, Optional

class DemoState(TypedDict):
    query: str
    platform: str
    action_plan: List[str]
    extracted_content: str
    final_response: str
    error: Optional[str]
    operation_type: str
    confidence: float
    estimated_duration: int
    complexity_level: str
    current_step: int
    confidence_level: Optional[int]
    explanation: Optional[str]

This typed state ensures data consistency across all workflow nodes and makes debugging much easier.

Node Implementation: Smart Query Analysis

The first node uses structured output with Pydantic models:

from pydantic import BaseModel, Field
from langchain_google_genai import ChatGoogleGenerativeAI

class QueryAnalysis(BaseModel):
    platform: str = Field(
        description="The platform/service mentioned (e.g., 'stripe', 'openai')"
    )
    operation_type: str = Field(
        description="API operation type (e.g., 'GET', 'POST', 'authentication')"
    )
    confidence: Optional[float] = Field(
        default=None,
        description="Confidence score from 0.0 to 1.0"
    )

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
structured_llm = llm.with_structured_output(QueryAnalysis)

async def analyze_query(state: DemoState) -> DemoState:
    query = state["query"]

    analysis_prompt = f"""
    Analyze this query to extract:
    1. The platform/service they're asking about
    2. The type of API operation they want
    3. Your confidence in this analysis

    Query: {query}

    Look for: Bright Data, Stripe, OpenAI, Twilio, etc.
    Operations: GET, POST, authentication, webhook, etc.
    """

    try:
        analysis = await structured_llm.ainvoke(analysis_prompt)
        return {
            "platform": analysis.platform.lower().replace(" ", "_"),
            "operation_type": analysis.operation_type,
            "confidence": analysis.confidence or 0.8
        }
    except Exception as e:
        # Fallback logic with keyword matching
        return {"platform": "unknown", "operation_type": "general", "confidence": 0.0}

The Game Changer: Bright Data Integration

Here's where it gets interesting. Instead of basic web scraping that breaks with bot detection, I integrated Bright Data's MCP (Model Context Protocol) for enterprise-grade data extraction:

from mcp_use.client import MCPClient
from mcp_use.adapters.langchain_adapter import LangChainAdapter
from langgraph.prebuilt import create_react_agent

async def execute_browser(state: DemoState) -> DemoState:
    action_plan = state["action_plan"]
    query = state["query"]
    platform = state["platform"]

    # Bright Data MCP configuration
    browserai_config = {
        "mcpServers": {
            "BrightData": {
                "command": "npx",
                "args": ["@brightdata/mcp"],
                "env": {
                    "API_TOKEN": os.getenv("BRIGHT_DATA_API_TOKEN"),
                    "WEB_UNLOCKER_ZONE": os.getenv("WEB_UNLOCKER_ZONE"),
                    "BROWSER_ZONE": os.getenv("BROWSER_ZONE")
                }
            }
        }
    }

    client = MCPClient.from_dict(browserai_config)
    adapter = LangChainAdapter()
    tools = await adapter.create_tools(client)

    # Create agent with Bright Data tools
    agent = create_react_agent(
        model=llm,
        tools=tools,
        prompt="""You are a web scraping expert with access to:
        - search_engine: Google/Bing/Yandex results
        - scrape_as_markdown: Bot-detection bypass
        - Structured extractors: Platform-specific scrapers
        - Browser automation: Navigate, click, type, screenshot
        """
    )

    result = await agent.ainvoke({
        "messages": [{
            "role": "user",
            "content": f"Extract API docs for: {query}\nPlatform: {platform}\nPlan: {action_plan}"
        }]
    })

    return process_extraction_result(result)

Deployment Flexibility: Two Interfaces, One Engine

Option 1: Streamlit Web Interface

Perfect for teams and interactive exploration:

import streamlit as st
import asyncio

def main():
    st.title("🤖 API Documentation Agent")

    query = st.text_area("Enter your API documentation query:")

    if st.button("Generate Documentation"):
        with st.spinner("Processing..."):
            # Real-time progress tracking
            for update in run_agent(query):
                if update["status"] == "completed":
                    st.markdown(update["results"]["final_response"])

Option 2: MCP Server for IDE Integration

This is where it gets really cool. The entire agent becomes a tool in your coding environment:

from fastmcp import FastMCP

graph = create_demo_graph()
mcp = FastMCP("API-Documentation-Agent")

@mcp.tool()
async def generate_api_docs(question: str) -> str:
    """Generate API documentation from natural language query"""

    initial_state = {
        "query": question,
        "platform": "",
        "action_plan": [],
        # ... full state initialization
    }

    result = await graph.ainvoke(initial_state)
    return result.get("final_response", "No documentation generated")

if __name__ == "__main__":
    mcp.run(transport="stdio")

Claude Desktop Integration

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "EasyDocs": {
      "command": "/path/to/venv/bin/python",
      "args": ["/path/to/mcp_wrapper.py"]
    }
  }
}

Now you can use it directly in Claude Desktop:

@EasyDocs generate documentation for creating a Stripe payment intent

Real Output Example

Here's what you get when asking: "How to send a POST request to Bright Data's Web Scraper API"

Documentation Quality: 🟢 High Confidence (9/10)

Bright Data Web Scraper API Documentation

Quick Start

curl --request POST \
  --url https://api.brightdata.com/datasets/v3/trigger?dataset_id=YOUR_DATASET_ID \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '[{"url": "https://example.com"}]'

Authentication

Type: Bearer Token
Header: Authorization: Bearer YOUR_API_KEY
Get API Key: Bright Data dashboard → Settings → API tokens

Request Parameters

dataset_id (required): Your dataset identifier
url (required): Target URL to scrape
type: Set to "discover_new" for discovery phase
limit_per_input: Limit results per input

Response Format

{
"snapshot_id": "snap_12345",
"status": "running",
"dataset_id": "gd_l1vikfnt1wgvvqz95w"
}

Error Handling

400: Invalid parameters or malformed request
401: Invalid or missing API key
429: Rate limit exceeded - implement exponential backoff

Technical Deep Dive: Why This Architecture Works

1. Structured State Management

Using TypedDict ensures type safety while maintaining the flexibility needed for dynamic workflows. Each node knows exactly what data it receives and what it should return.

2. Fault Tolerance

Every node has fallback logic. If the LLM fails, we fall back to keyword matching. If Bright Data is unavailable, we provide basic documentation templates.

3. Modular Design

Each LangGraph node is independently testable:

async def test_analyze_query():
    state = {"query": "How to authenticate with Stripe API"}
    result = await analyze_query(state)
    assert result["platform"] == "stripe"
    assert "auth" in result["operation_type"].lower()

4. Real-time Feedback

The Streamlit interface uses graph.astream() for step-by-step progress:

async for event in graph.astream(initial_state):
    for node_name, state_update in event.items():
        # Update UI with current progress
        display_progress(node_name, state_update)

Performance and Reliability

Bright Data Advantages

Bot Detection Bypass: No more CAPTCHA or IP blocking
Global Infrastructure: Consistent performance worldwide
Structured Extractors: Platform-specific scrapers for major APIs
Rate Limit Handling: Built-in retry logic and request management

Lessons Learned

1. State Design is Critical

I initially used a simple dictionary but quickly realized that TypedDict catches errors early and makes the codebase much more maintainable.

2. Fallback Everything

LLMs fail, APIs go down, networks are unreliable. Every component needs graceful degradation.

3. MCP is a Game Changer

Being able to use your AI agent directly in your coding environment eliminates context switching and makes it actually useful for daily development.

4. Bright Data's Reliability Matters

I tried basic web scraping first. Documentation sites have sophisticated bot detection, and you need enterprise infrastructure to consistently extract data.

Getting Started

# Clone and setup
git clone https://github.com/MeirKaD/EasyDocs
cd EasyDocs
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Add your API keys

# Run web interface
streamlit run app.py

# Or start MCP server
python mcp_wrapper.py

Key Takeaways for Developers

LangGraph makes complex AI workflows manageable - the state-based approach is much cleaner than chaining LLM calls
Structured outputs with Pydantic eliminate debugging nightmares - always use typed models for LLM responses
MCP bridges the gap between AI and daily development - tools that live in your editor get used
Enterprise infrastructure matters - Bright Data's reliability makes this production-ready
Multiple interfaces serve different use cases - web UI for exploration, MCP for integration

Connect and Contribute

This project is open source and I'm actively looking for contributors! Whether you want to add new platform support, improve the UI, or enhance the extraction logic, there's plenty to work on.

Found this useful? Let's connect and build the future of developer tooling together!

References

Project Github repo: https://github.com/MeirKaD/EasyDocs

Bright Data's MCP: https://github.com/brightdata/brightdata-mcp

What API documentation challenges are you facing? Drop a comment and let's discuss how AI agents might solve them!

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.