How I built a production-ready system that transforms natural language queries into copy-paste API documentation
The Problem Every Developer Faces
We've all been there. You're deep in a coding session, need to integrate with an API, and suddenly you're drowning in scattered documentation, outdated Stack Overflow posts, and incomplete code examples. What if you could just ask in plain English: "How do I send a POST request to Stripe's payment API?" and get back clean, production-ready code with authentication, error handling, and everything you need?
That's exactly what I built with EasyDocs – an AI-powered documentation agent that combines LangGraph's workflow orchestration with Bright Data's enterprise web scraping to deliver instant, reliable API documentation.
The system uses a 4-node LangGraph workflow where each step has a specific responsibility:
- Analyze Query: Uses Google Gemini to identify platform and operation type
- Generate Plan: Creates actionable browser automation steps
- Execute Browser: Leverages Bright Data's MCP for robust web scraping
- Generate Response: Formats everything into developer-friendly documentation
Demo
The Code: LangGraph State Management
First, let's look at the state structure that flows through our workflow:
from typing import TypedDict, List, Optional
class DemoState(TypedDict):
query: str
platform: str
action_plan: List[str]
extracted_content: str
final_response: str
error: Optional[str]
operation_type: str
confidence: float
estimated_duration: int
complexity_level: str
current_step: int
confidence_level: Optional[int]
explanation: Optional[str]
This typed state ensures data consistency across all workflow nodes and makes debugging much easier.
Node Implementation: Smart Query Analysis
The first node uses structured output with Pydantic models:
from pydantic import BaseModel, Field
from langchain_google_genai import ChatGoogleGenerativeAI
class QueryAnalysis(BaseModel):
platform: str = Field(
description="The platform/service mentioned (e.g., 'stripe', 'openai')"
)
operation_type: str = Field(
description="API operation type (e.g., 'GET', 'POST', 'authentication')"
)
confidence: Optional[float] = Field(
default=None,
description="Confidence score from 0.0 to 1.0"
)
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
structured_llm = llm.with_structured_output(QueryAnalysis)
async def analyze_query(state: DemoState) -> DemoState:
query = state["query"]
analysis_prompt = f"""
Analyze this query to extract:
1. The platform/service they're asking about
2. The type of API operation they want
3. Your confidence in this analysis
Query: {query}
Look for: Bright Data, Stripe, OpenAI, Twilio, etc.
Operations: GET, POST, authentication, webhook, etc.
"""
try:
analysis = await structured_llm.ainvoke(analysis_prompt)
return {
"platform": analysis.platform.lower().replace(" ", "_"),
"operation_type": analysis.operation_type,
"confidence": analysis.confidence or 0.8
}
except Exception as e:
# Fallback logic with keyword matching
return {"platform": "unknown", "operation_type": "general", "confidence": 0.0}
The Game Changer: Bright Data Integration
Here's where it gets interesting. Instead of basic web scraping that breaks with bot detection, I integrated Bright Data's MCP (Model Context Protocol) for enterprise-grade data extraction:
from mcp_use.client import MCPClient
from mcp_use.adapters.langchain_adapter import LangChainAdapter
from langgraph.prebuilt import create_react_agent
async def execute_browser(state: DemoState) -> DemoState:
action_plan = state["action_plan"]
query = state["query"]
platform = state["platform"]
# Bright Data MCP configuration
browserai_config = {
"mcpServers": {
"BrightData": {
"command": "npx",
"args": ["@brightdata/mcp"],
"env": {
"API_TOKEN": os.getenv("BRIGHT_DATA_API_TOKEN"),
"WEB_UNLOCKER_ZONE": os.getenv("WEB_UNLOCKER_ZONE"),
"BROWSER_ZONE": os.getenv("BROWSER_ZONE")
}
}
}
}
client = MCPClient.from_dict(browserai_config)
adapter = LangChainAdapter()
tools = await adapter.create_tools(client)
# Create agent with Bright Data tools
agent = create_react_agent(
model=llm,
tools=tools,
prompt="""You are a web scraping expert with access to:
- search_engine: Google/Bing/Yandex results
- scrape_as_markdown: Bot-detection bypass
- Structured extractors: Platform-specific scrapers
- Browser automation: Navigate, click, type, screenshot
"""
)
result = await agent.ainvoke({
"messages": [{
"role": "user",
"content": f"Extract API docs for: {query}\nPlatform: {platform}\nPlan: {action_plan}"
}]
})
return process_extraction_result(result)
Deployment Flexibility: Two Interfaces, One Engine
Option 1: Streamlit Web Interface
Perfect for teams and interactive exploration:
import streamlit as st
import asyncio
def main():
st.title("🤖 API Documentation Agent")
query = st.text_area("Enter your API documentation query:")
if st.button("Generate Documentation"):
with st.spinner("Processing..."):
# Real-time progress tracking
for update in run_agent(query):
if update["status"] == "completed":
st.markdown(update["results"]["final_response"])
Option 2: MCP Server for IDE Integration
This is where it gets really cool. The entire agent becomes a tool in your coding environment:
from fastmcp import FastMCP
graph = create_demo_graph()
mcp = FastMCP("API-Documentation-Agent")
@mcp.tool()
async def generate_api_docs(question: str) -> str:
"""Generate API documentation from natural language query"""
initial_state = {
"query": question,
"platform": "",
"action_plan": [],
# ... full state initialization
}
result = await graph.ainvoke(initial_state)
return result.get("final_response", "No documentation generated")
if __name__ == "__main__":
mcp.run(transport="stdio")
Claude Desktop Integration
Add this to your claude_desktop_config.json
:
{
"mcpServers": {
"EasyDocs": {
"command": "/path/to/venv/bin/python",
"args": ["/path/to/mcp_wrapper.py"]
}
}
}
Now you can use it directly in Claude Desktop:
@EasyDocs generate documentation for creating a Stripe payment intent
Real Output Example
Here's what you get when asking: "How to send a POST request to Bright Data's Web Scraper API"
Documentation Quality: 🟢 High Confidence (9/10)
Bright Data Web Scraper API Documentation
Quick Start
curl --request POST \
--url https://api.brightdata.com/datasets/v3/trigger?dataset_id=YOUR_DATASET_ID \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '[{"url": "https://example.com"}]'
Authentication
- Type: Bearer Token
-
Header:
Authorization: Bearer YOUR_API_KEY
- Get API Key: Bright Data dashboard → Settings → API tokens
Request Parameters
-
dataset_id
(required): Your dataset identifier -
url
(required): Target URL to scrape -
type
: Set to "discover_new" for discovery phase -
limit_per_input
: Limit results per input
Response Format
{
"snapshot_id": "snap_12345",
"status": "running",
"dataset_id": "gd_l1vikfnt1wgvvqz95w"
}
Error Handling
- 400: Invalid parameters or malformed request
- 401: Invalid or missing API key
- 429: Rate limit exceeded - implement exponential backoff
Technical Deep Dive: Why This Architecture Works
1. Structured State Management
Using TypedDict ensures type safety while maintaining the flexibility needed for dynamic workflows. Each node knows exactly what data it receives and what it should return.
2. Fault Tolerance
Every node has fallback logic. If the LLM fails, we fall back to keyword matching. If Bright Data is unavailable, we provide basic documentation templates.
3. Modular Design
Each LangGraph node is independently testable:
async def test_analyze_query():
state = {"query": "How to authenticate with Stripe API"}
result = await analyze_query(state)
assert result["platform"] == "stripe"
assert "auth" in result["operation_type"].lower()
4. Real-time Feedback
The Streamlit interface uses graph.astream()
for step-by-step progress:
async for event in graph.astream(initial_state):
for node_name, state_update in event.items():
# Update UI with current progress
display_progress(node_name, state_update)
Performance and Reliability
Bright Data Advantages
- Bot Detection Bypass: No more CAPTCHA or IP blocking
- Global Infrastructure: Consistent performance worldwide
- Structured Extractors: Platform-specific scrapers for major APIs
- Rate Limit Handling: Built-in retry logic and request management
Lessons Learned
1. State Design is Critical
I initially used a simple dictionary but quickly realized that TypedDict catches errors early and makes the codebase much more maintainable.
2. Fallback Everything
LLMs fail, APIs go down, networks are unreliable. Every component needs graceful degradation.
3. MCP is a Game Changer
Being able to use your AI agent directly in your coding environment eliminates context switching and makes it actually useful for daily development.
4. Bright Data's Reliability Matters
I tried basic web scraping first. Documentation sites have sophisticated bot detection, and you need enterprise infrastructure to consistently extract data.
Getting Started
# Clone and setup
git clone https://github.com/MeirKaD/EasyDocs
cd EasyDocs
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Add your API keys
# Run web interface
streamlit run app.py
# Or start MCP server
python mcp_wrapper.py
Key Takeaways for Developers
- LangGraph makes complex AI workflows manageable - the state-based approach is much cleaner than chaining LLM calls
- Structured outputs with Pydantic eliminate debugging nightmares - always use typed models for LLM responses
- MCP bridges the gap between AI and daily development - tools that live in your editor get used
- Enterprise infrastructure matters - Bright Data's reliability makes this production-ready
- Multiple interfaces serve different use cases - web UI for exploration, MCP for integration
Connect and Contribute
This project is open source and I'm actively looking for contributors! Whether you want to add new platform support, improve the UI, or enhance the extraction logic, there's plenty to work on.
Found this useful? Let's connect and build the future of developer tooling together!
References
Project Github repo: https://github.com/MeirKaD/EasyDocs
Bright Data's MCP: https://github.com/brightdata/brightdata-mcp
What API documentation challenges are you facing? Drop a comment and let's discuss how AI agents might solve them!
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.