DEV Community

Cover image for Build an AI-Powered Competitive Intelligence Monitor
Linghua Jin
Linghua Jin

Posted on

Build an AI-Powered Competitive Intelligence Monitor

Staying ahead of competitors requires constant vigilance—tracking product launches, funding rounds, partnerships, and strategic moves across the web. The open-source Competitive Intelligence Monitor project demonstrates how to automate this process using CocoIndex, Tavily Search, and LLM extraction to continuously track and structure competitor news into a queryable PostgreSQL database.

What the Project Does

The system automates web monitoring by using Tavily's AI-native search to pull full-text articles, then feeding them through a GPT-4o-mini–based extraction layer to detect structured "competitive events" such as:

  • Product launches and feature releases
  • Partnerships and collaborations
  • Funding rounds and financial news
  • Key executive hires/departures
  • Acquisitions and mergers

These events and their source articles are stored in PostgreSQL so teams can ask natural questions like "What has Anthropic been doing recently?" or "Which competitors are making the most news this week?"

Core Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Tavily AI   │────▶│  CocoIndex   │────▶│  PostgreSQL  │
│    Search    │     │   Pipeline   │     │   Database   │
└──────────────┘     └──────────────┘     └──────────────┘
       │                    │                    │
       ▼                    ▼                    ▼
   Articles           Extraction           Intelligence
  (web data)        (GPT-4o-mini)         (structured)
Enter fullscreen mode Exit fullscreen mode

Data flows from Tavily search results into an LLM extraction step that produces CompetitiveEvent objects, then into dual indexes—one table for raw articles and another for normalized events.

Data Model: The CompetitiveEvent Class

At the heart of the extraction is the CompetitiveEvent dataclass that defines what the LLM should extract from each article:

@dataclasses.dataclass
class CompetitiveEvent:
    """A competitive intelligence event extracted from text.

    Examples:
    - Product Launch: "OpenAI released GPT-5 with multimodal capabilities"
    - Partnership: "Anthropic partnered with Google Cloud for enterprise AI"
    - Funding: "Mistral AI raised $400M Series B led by Andreessen Horowitz"
    - Key Hire: "Former Meta AI director joined Cohere as Chief Scientist"
    - Strategic Move: "Microsoft acquired AI startup Inflection for $650M"
    """
    event_type: str      # "product_launch", "partnership", "funding", "key_hire", "acquisition", "other"
    competitor: str      # Company name (e.g., "OpenAI", "Anthropic", "Google AI")
    description: str     # Brief description of the event
    significance: str    # "high", "medium", "low" - based on market impact
    related_companies: list[str]  # Other companies mentioned
Enter fullscreen mode Exit fullscreen mode

Custom Tavily Source Connector

The project implements a custom CocoIndex source connector that interfaces with Tavily's AI-native search API:

class TavilySearchSource(SourceSpec):
    """Fetches competitive intelligence using Tavily AI Search API."""
    competitor: str
    days_back: int = 7
    max_results: int = 10

@source_connector(
    spec_cls=TavilySearchSource,
    key_type=_ArticleKey,
    value_type=_Article,
)
class TavilySearchConnector:
    async def list(self) -> AsyncIterator[PartialSourceRow[_ArticleKey, _Article]]:
        """List articles from Tavily search."""
        search_query = (
            f"{self._spec.competitor} AND "
            f"(funding OR partnership OR product launch OR acquisition OR executive hire)"
        )

        client = TavilyClient(api_key=self._api_key)
        response = client.search(
            query=search_query,
            search_depth="advanced",
            max_results=self._spec.max_results,
            include_raw_content=True,
        )

        for result in response.get("results", []):
            url = result["url"]
            yield PartialSourceRow(
                key=_ArticleKey(url=url),
                data=PartialSourceRowData(ordinal=ordinal),
            )
Enter fullscreen mode Exit fullscreen mode

The CocoIndex Pipeline Definition

The main pipeline uses CocoIndex's flow builder to orchestrate data collection and LLM extraction:

@cocoindex.flow_def(name="CompetitiveIntelligence")
def competitive_intelligence_flow(
    flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
    """Main pipeline for competitive intelligence monitoring."""

    competitors = os.getenv("COMPETITORS", "OpenAI,Anthropic").split(",")
    refresh_interval = int(os.getenv("REFRESH_INTERVAL_SECONDS", "3600"))

    # Add Tavily search source for each competitor
    for competitor in competitors:
        data_scope[f"articles_{competitor.strip()}"] = flow_builder.add_source(
            TavilySearchSource(
                competitor=competitor.strip(),
                days_back=search_days_back,
                max_results=10,
            ),
            refresh_interval=timedelta(seconds=refresh_interval),
        )

    articles_index = data_scope.add_collector()
    events_index = data_scope.add_collector()

    # Process each competitor's articles
    for competitor in competitors:
        articles = data_scope[f"articles_{competitor.strip()}"]

        with articles.row() as article:
            # Extract competitive events using GPT-4o-mini via OpenRouter
            article["events"] = article["content"].transform(
                cocoindex.functions.ExtractByLlm(
                    llm_spec=cocoindex.LlmSpec(
                        api_type=cocoindex.LlmApiType.OPENAI,
                        model="openai/gpt-4o-mini",
                        address="https://openrouter.ai/api/v1",
                    ),
                    output_type=list[CompetitiveEvent],
                    instruction=(
                        "Extract competitive intelligence events from this article. "
                        "Focus on: product launches, partnerships, funding rounds, key hires, "
                        "acquisitions, and other strategic moves."
                    ),
                )
            )
Enter fullscreen mode Exit fullscreen mode

Query Handlers for Analysis

The project includes built-in query handlers that enable SQL-powered intelligence retrieval:

@competitive_intelligence_flow.query_handler()
def search_by_competitor(
    competitor: str, 
    event_type: str | None = None, 
    limit: int = 20
) -> cocoindex.QueryOutput:
    """Find recent competitive intelligence about a specific competitor."""

    with connection_pool().connection() as conn:
        with conn.cursor() as cur:
            sql = f"""
                SELECT e.competitor, e.event_type, e.description, e.significance,
                       e.related_companies, a.title, a.url, a.source, a.published_at
                FROM {events_table} e
                JOIN {articles_table} a ON e.article_id = a.id
                WHERE LOWER(e.competitor) LIKE LOWER(%s)
            """
            params = [f"%{competitor}%"]

            if event_type:
                sql += " AND e.event_type = %s"
                params.append(event_type)

            sql += " ORDER BY a.published_at DESC LIMIT %s"
            cur.execute(sql, params)

            return cocoindex.QueryOutput(results=[...])
Enter fullscreen mode Exit fullscreen mode

Getting Started

Configuration is controlled through environment variables:

DATABASE_URL=postgresql://user:password@localhost:5432/competitive_intel
COCOINDEX_DATABASE_URL=postgresql://user:password@localhost:5432/competitive_intel
OPENAI_API_KEY=sk-or-v1-...
TAVILY_API_KEY=tvly-...
COMPETITORS=OpenAI,Anthropic,Google AI,Meta AI,Mistral AI
REFRESH_INTERVAL_SECONDS=3600
SEARCH_DAYS_BACK=7
Enter fullscreen mode Exit fullscreen mode

Run the interactive CLI for first-time setup:

python3 run_interactive.py
Enter fullscreen mode Exit fullscreen mode

Or use CocoIndex directly for automated deployments:

cocoindex update main -f          # Initial sync
cocoindex update -L main.py       # Continuous monitoring
Enter fullscreen mode Exit fullscreen mode

Why This Approach Matters

By combining AI-native search with structured LLM extraction, the monitor:

  • Avoids brittle scraping - Tavily handles content extraction
  • De-duplicates work - CocoIndex tracks processed articles via incremental processing
  • Turns noise into signal - Structured events with significance scoring
  • Enables flexible analysis - Dual indexing (raw + extracted) for maximum flexibility

The project supports multiple query types:

  • Search by competitor name
  • Filter by event type (funding, partnerships, acquisitions, etc.)
  • Rank by significance (high=3, medium=2, low=1 weighted scoring)
  • Trend analysis across time periods

Get the Code

The project is MIT-licensed and available on GitHub:

GitHub logo Laksh-star / competitive-intelligence

AI-powered competitive intelligence monitor using CocoIndex, Tavily Search, and LLM extraction

Competitive Intelligence Monitor

Python 3.11+ CocoIndex License: MIT

Track competitor mentions across the web using AI-powered search and LLM extraction. Automatically monitors competitors, extracts competitive intelligence events, and stores structured data in PostgreSQL for analysis.

What This Does

This pipeline automatically:

  • Searches the web using Tavily AI (AI-native search engine optimized for agents)
  • Extracts competitive intelligence events using DeepSeek LLM analysis
    • Product launches and feature releases
    • Partnerships and collaborations
    • Funding rounds and financial news
    • Key executive hires/departures
    • Acquisitions and mergers
  • Indexes both raw articles and extracted events in PostgreSQL
  • Enables queries like:
    • "What has OpenAI been doing recently?"
    • "Which competitors are making the most news?"
    • "Find all partnership announcements"
    • "What are the most significant competitive moves this week?"

Prerequisites

  1. PostgreSQL Database - Choose one option
    • Local PostgreSQL installation
    • Cloud PostgreSQL (AWS RDS, Google Cloud SQL, Azure Database, etc.)
  2. Python 3.11+ - Required for CocoIndex
  3. API Keys (required)
    • Tavily API key from tavily.com (free tier: 1,000…

Built with:

Have questions or want to contribute? Drop a comment below or open an issue on GitHub!

Top comments (0)