Meir

Posted on Jun 26

Building a Unified Search Agent with LangGraph and MCP: From Intent Classification to Production Deployment

#ai #langchain #mcp #powerfuldevs

Building a Unified Search Agent with LangGraph and MCP: From Intent Classification to Production Deployment

Ever wished you could have a single AI agent that intelligently decides whether to use Google search, scrape specific websites, or combine multiple data sources based on your query? That's exactly what I built with the Unified Search Agent - a sophisticated multi-modal search system that uses LLM-powered intent classification to route queries to the most appropriate search strategy.

🎯 The Problem

Traditional search solutions often force you to choose upfront: Do you want web search results or scraped data? Should you check multiple sources or stick to one? The Unified Search Agent eliminates this decision fatigue by automatically:

Classifying your intent using Gemini 2.0 Flash
Routing intelligently between Google search and web scraping
Combining results from multiple sources when needed
Scoring and ranking final results for maximum relevance

🏗️ Architecture Overview

The agent follows a sophisticated graph-based workflow built with LangGraph:

START → Intent Classifier → [Google Search | Web Scraping | Both] → Final Processing → END

Intent Classification Categories

The system classifies queries into four categories:

general_search: News, facts, definitions, explanations
product_search: Shopping, prices, reviews, recommendations
web_scraping: Data extraction from specific websites
comparison: Comparing multiple items or services

Smart Routing Logic

URLs detected → Direct to Web Scraping
General search → Google Search only
Product search → Google Search → Web Scraping
Web scraping → Web Scraping only
Comparison → Both methods in parallel

🛠️ Technology Stack

LangGraph: Orchestration and state management
Gemini 2.0 Flash: Intent classification and result processing
Bright Data MCP: Google search and web scraping
Pydantic: Structured data validation
FastMCP: MCP server wrapper for easy integration

🚀 Key Features

1. Intelligent Intent Classification

class IntentClassification(BaseModel):
    intent: str = Field(description="The classified intent")
    confidence: float = Field(description="Confidence score", ge=0.0, le=1.0)
    reasoning: str = Field(description="Explanation for classification")

The system uses structured output to ensure consistent, reliable classification with confidence scoring.

2. Multi-Source Data Integration

The agent seamlessly combines results from:

Google Search via Bright Data's search engine
Web Scraping using Bright Data's Web Unlocker
Parallel processing for comparison queries

3. Advanced Result Scoring

Every result gets scored on two dimensions:

Relevance Score (0-1): How well it answers the query
Quality Score (0-1): Source authority and content quality
Final Score: (relevance * 0.7) + (quality * 0.3)

4. Production-Ready Error Handling

def intent_classifier_node(state: Dict[str, Any]) -> Dict[str, Any]:
    try:
        # Classification logic
        result = structured_llm.invoke(full_prompt)
        # Update state
    except Exception as e:
        # Graceful fallback
        state["intent"] = "general_search"
        state["intent_confidence"] = 0.5
        state["error"] = f"Classification error: {str(e)}"
    return state

Every node includes comprehensive error handling with meaningful fallbacks.

💡 Usage Examples

Basic Search

{
  "query": "latest developments in quantum computing",
  "max_results": 5
}

Product Research

{
  "query": "best noise-canceling headphones under $200",
  "max_results": 8
}

Targeted Web Scraping

{
  "query": "extract pricing from https://example-store.com/laptops",
  "max_results": 10
}

Comparison Analysis

{
  "query": "MacBook Pro M3 vs Dell XPS 15 specifications comparison",
  "max_results": 6
}

🌐 MCP Server Integration

One of the coolest features is that the entire agent is wrapped as an MCP (Model Context Protocol) server:

from fastmcp import FastMCP
from src.agent.graph import create_unified_search_graph

graph = create_unified_search_graph()
mcp = FastMCP("Unified-Search-Agent")

@mcp.tool()
async def search(search_term: str) -> str:
    """Run unified search across multiple platforms"""
    result = await graph.ainvoke({"query": search_term})
    return result.get("final_results", "No results found.")

This means you can easily integrate the search agent into:

Claude Desktop via MCP
Other AI applications that support MCP
Custom workflows that need intelligent search

☁️ LangGraph Cloud Deployment

The agent is designed for seamless deployment on LangGraph Cloud:

{
  "dependencies": ["."],
  "graphs": {
    "agent": "./src/agent/graph.py:graph"
  },
  "env": ".env",
  "image_distro": "wolfi"
}

Simply configure your environment variables and deploy:

# Required API keys
GOOGLE_API_KEY=your_gemini_api_key
BRIGHT_DATA_API_TOKEN=your_bright_data_token

# Optional - for tracing
LANGSMITH_API_KEY=your_langsmith_key

🎨 LangGraph Studio Integration

The agent integrates beautifully with LangGraph Studio for visual debugging:

You can:

Visualize the entire workflow in real-time
Debug specific nodes by editing past states
Hot reload changes during development
Monitor performance with integrated tracing

📊 Example Output Structure

The agent returns comprehensively structured results:

{
  "final_results": [
    {
      "title": "Quantum Computing Breakthrough 2024",
      "url": "https://example.com/quantum-news",
      "snippet": "Scientists achieve new milestone...",
      "source": "google_search",
      "relevance_score": 0.95,
      "quality_score": 0.88,
      "final_score": 0.92,
      "metadata": {
        "search_engine": "google",
        "via": "bright_data_mcp"
      }
    }
  ],
  "query_summary": "Found 5 recent articles about quantum computing advances",
  "total_processed": 12,
  "intent": "general_search",
  "intent_confidence": 0.95
}

🔄 Extending the Agent

The modular architecture makes it easy to extend:

Add New Search Sources

workflow.add_node("scholarly_search", scholarly_search_node)
workflow.add_conditional_edges("intent_classifier", route_with_scholarly)

Customize Intent Categories

# Add to intent_classifier_node
intents = ["general_search", "academic_research", "product_search", "web_scraping"]

Modify Scoring Algorithms

# Update in final_processing_node
final_score = (relevance * 0.6) + (quality * 0.3) + (freshness * 0.1)

🚦 Getting Started

Clone the repository:

git clone https://github.com/MeirKaD/unified-search
cd unified-search-agent

Install dependencies:

pip install -e . "langgraph-cli[inmem]"

Set up environment:

cp .env.example .env
# Add your API keys to .env

Start development server:

langgraph dev

Open LangGraph Studio and start experimenting!

🎯 Real-World Applications

This architecture pattern is perfect for:

Research Assistants: Automatically choosing between academic databases and web search
E-commerce Tools: Combining product catalogs with review sites
Market Research: Gathering data from multiple competitive intelligence sources
Content Curation: Intelligently sourcing content from various platforms
Due Diligence: Comprehensive information gathering for business decisions

🔮 Future Enhancements

The agent architecture supports easy addition of:

Vector database integration for semantic search
Real-time data streams for live information
Specialized extractors for platforms like LinkedIn, Twitter, Reddit
Multi-language support with automatic translation
Custom scoring models trained on your specific use cases

💭 Lessons Learned

Building this agent taught me several key insights:

Intent classification is crucial - Getting this right determines the entire user experience
Structured outputs are game-changers - Pydantic models ensure reliability at scale
Error handling is not optional - Every node needs graceful degradation
Visual debugging saves hours - LangGraph Studio made development 10x faster
MCP integration opens new possibilities - Wrapping as MCP makes the agent universally useful

🎉 Conclusion

The Unified Search Agent represents a new paradigm in AI-powered search - one where intelligence lies not just in finding information, but in knowing how to find it. By combining LangGraph's orchestration capabilities with advanced intent classification and multi-modal search strategies, we've created something that adapts to user needs rather than forcing users to adapt to tool limitations.

The fact that it deploys seamlessly to LangGraph Cloud and integrates via MCP makes it a practical solution for real-world applications. Whether you're building a research assistant, enhancing customer support, or creating the next generation of search experiences, this architecture provides a solid foundation.

Try it out and let me know what you build!

🔗 Links

What search challenges are you trying to solve? Drop a comment and let's discuss how this architecture could help! 👇

DEV Community

Building a Unified Search Agent with LangGraph and MCP: From Intent Classification to Production Deployment

Building a Unified Search Agent with LangGraph and MCP: From Intent Classification to Production Deployment

🎯 The Problem

🏗️ Architecture Overview

Intent Classification Categories

Smart Routing Logic

🛠️ Technology Stack

🚀 Key Features

1. Intelligent Intent Classification

2. Multi-Source Data Integration

3. Advanced Result Scoring

4. Production-Ready Error Handling

💡 Usage Examples

Basic Search

Product Research

Targeted Web Scraping

Comparison Analysis

🌐 MCP Server Integration

☁️ LangGraph Cloud Deployment

🎨 LangGraph Studio Integration

📊 Example Output Structure

🔄 Extending the Agent

Add New Search Sources

Customize Intent Categories

Modify Scoring Algorithms

🚦 Getting Started

🎯 Real-World Applications

🔮 Future Enhancements

💭 Lessons Learned

🎉 Conclusion

🔗 Links

Top comments (0)