Temitope

Posted on Feb 27

Unbiased Intelligence in a Biased Game: Building Elite Sports Intelligence Engine with Elastic Agent Builder

#elasticsearch #ai #webdev #programming

Introduction

Football is a global language spoken by over 5 billion people. But it's also a game defined by extreme unpredictability. As humans, we can’t help but be biased. When we see a club like Liverpool spending over $400 million in a transfer window, our brains tell us they should beat a smaller squad like Brentford or AFC Bournemouth. We factor in money, trophies, history, fanbase, and the "weight of the jersey".

These biases cloud judgment. I was inspired by the idea of removing this emotional cloud. I wanted to build an agent that ignores the hype and looks only at the raw, unbiased data provided by leagues across the world—from the Premier League to the Bundesliga. My goal was to create an "Elite Sports Intelligence Engine" that gives fans and journalists the objective truth.

The Brain of the Operation: What the Agent Actually Does

In most AI applications, you ask a question and the model predicts the next word based on its training. But football changes every day—yesterday's stats are today's history. To solve this, I didn't just build a chatbot; I built an Autonomous Orchestrator using the Elastic Agent Builder.

Beyond RAG: From Search to Reasoning

While standard Retrieval-Augmented Generation (RAG) simply finds documents, my agent performs Agentic RAG. When you select a match like Arsenal vs. Man City, the agent doesn't just "look up" the teams. It initiates a multi-step reasoning loop:

Deconstruction: It breaks your request into specific data requirements (e.g., "I need home form for Team A and away form for Team B").
Tool Selection: It scans its "utility belt" of 14+ custom ES|QL tools to decide which ones will provide the most accurate tactical narrative.
Data Synthesis: It executes complex ES|QL queries against my Elasticsearch indices, retrieving structured stats on expected goals (xG), referee behavior, and player fatigue.
Final Verdict: It synthesizes these disparate data points into a cohesive, professional-grade match report.

The "Innovation" Protocol

What makes this agent unique is its ability to uncover hidden variables. For example, if my Fatigue Analysis Tool reports that a team has had only three days of rest, and my Official Conduct Tool shows the assigned referee has a high yellow-card frequency, the agent is smart enough to flag this as a "High-Risk Discipline Scenario." It connects dots that a human analyst might spend hours trying to find.

By using the Elastic Agent Builder, I was able to give the LLM "hands" (the ability to query data) and "eyes" (the ability to see real-time trends), turning a simple language model into a high-fidelity Sports Intelligence Engine.

Data Ingestion – Keeping the Engine Fresh

One of the core strengths of the Elastic Agent Builder is its ability to connect an LLM to private, high-velocity data. But an agent is only as smart as the data it can access. Since football matches are played almost every hour across the globe, a static database simply wouldn't cut it.

I needed a "Live Memory" for my agent.

That’s why I built an automated daily ingestion pipeline using Next.js and Vercel’s cron jobs.

All match data (results, goals, corners, cards, shots, etc.) is sourced from http://football-data.co.uk — one of the most comprehensive free football databases available. It covers an impressive range of leagues, including:

England (Premier League E0, Championship E1, League One E2, League Two E3, National League EC, and more)
Spain (La Liga SP1, Segunda SP2)
Germany (Bundesliga D1, 2. Bundesliga D2)
Italy (Serie A I1, Serie B I2)
France (Ligue 1 F1, Ligue 2 F2)
Netherlands (Eredivisie N1)
Belgium (Pro League B1)
Portugal (Primeira Liga P1)
Turkey (Süper Lig T1)
Greece (Super League G1)
And many others: Argentina (ARG), Austria (AUT), Brazil (BRA), Denmark (DNK), Finland (FIN), Ireland (IRL), Japan (JPN), Mexico (MEX), Norway (NOR), Poland (POL), Romania (ROU), Russia (RUS), Sweden (SWE), Switzerland (SUI), USA (MLS/USA), and more.

This wide coverage means the engine isn’t limited to just one league — it can analyze matches from most major global competitions.

The magic happens in a single Next.js API route: /api/update-leagues.

Fetch — Every day, the route downloads the latest CSV files from football-data.co.uk (main leagues use the mmz4281/2526/ path, others use /new/).
Parse — Using csv-parse/sync, I read the CSV into records, standardize dates (DD/MM/YY → ISO YYYY-MM-DD), and add leagueCode for easy filtering.
Incremental sync — Before uploading, I query Elasticsearch for the latest date in that league. If the CSV has newer data (or it’s the first run), I proceed.
Upsert safely — I create a unique _id for each match (YYYY-MM-DD_HomeTeam_AwayTeam) so duplicates are automatically overwritten or skipped — no manual cleanup needed.
Bulk upload — Using the Elasticsearch JavaScript client, I send batches via client.bulk({ refresh: true }) — fast, atomic, and efficient.

To run this daily without any manual intervention, I added a simple vercel.json file:

{
  "crons": [
    {
      "path": "/api/update-leagues",
      "schedule": "0 0 * * *"   // Every day at midnight UTC
    }
  ]
}

The Tools That Power the Engine

At the heart of Elite Sports Intelligence Engine are 19 custom ES|QL tools I built directly in Elastic Agent Builder. These tools are not just simple queries — they are specialized intelligence modules that extract, aggregate, and interpret historical match data from the Elasticsearch index.

Each tool is tailored to a specific angle of football analysis, and the agent intelligently decides which ones to call and how to combine them.

Here’s a summary of the key tools and what they do:

get_corner_dynamics_insights — Analyzes set-piece pressure and territorial dominance through average corners per match, team splits, and high-volume corner events (9.5+). Helps identify matches likely to be defined by heavy wing play.
get_defensive_integrity_metrics — Calculates clean sheet frequency and shutout probability for both teams. Reveals elite defensive units or goalkeepers in form.
get_disciplinary_trend_insights — Tracks yellow/red card averages, foul patterns, and likelihood of high-caution games (3.5+ yellows). Flags tension and potential suspension risks.
get_double_chance_insights — Quantifies tactical resilience: probability of avoiding defeat (win or draw) or decisive outcomes. Highlights unbeaten streaks or high-stakes fixtures.
get_full_time_result_insights — Analyzes recent form and head-to-head to estimate win/draw/loss percentages. Includes sample-size warnings for reliability.
get_goal_density_insights — Categorizes match intensity (0-1, 2-3, 4+ goals). Reveals if a fixture tends toward tactical stalemates or open, high-scoring affairs.
get_league_context_benchmarks — Provides division-wide averages (goals, cards, corners) to show if a team is above or below league norms.
get_mutual_scoring_insights — Measures BTTS frequency. Identifies balanced attacks or games likely to feature both sides scoring.
get_offensive_penetration_metrics — Evaluates shot volume, high-attack games (15+ shots), and conversion efficiency. Differentiates dominant but wasteful teams from clinical finishers.
get_official_conduct_analysis — Analyzes referee tendencies (fouls called, yellow/red averages). Spots strict or lenient officials that could influence match flow.
get_over_under_goals_insights — Calculates over/under likelihood across 0.5 to 4.5 lines. Predicts whether the game will be low-scoring or explosive.
get_recent_form_trajectory — Tracks momentum via weighted points from the last 5 results. Identifies teams on upward or downward spirals.
get_schedule_density_analysis — Measures rest days since last fixture. Flags fatigue or rotation risks that could impact performance.
get_scoreline_distribution_analysis — Finds the most frequent final scorelines. Reveals "typical" outcomes between teams.
get_strategic_performance_indicators — Computes xG proxy (shots/SoT-based), form momentum, and H2H dominance. Spots teams over/under-performing their stats.
get_team_scoring_profile_insights — Analyzes individual team scoring averages and multi-goal consistency. Highlights scoring threats or droughts.
get_venue_performance_splits — Breaks down home/away win rates and goal differentials. Identifies "fortress" homes or "road warrior" away sides.
get_victory_margin_analysis — Measures average goal margin and win-to-nil frequency. Reveals dominant winners or tight contests.

The agent doesn’t just run one tool — it chains several in multi-step reasoning. For example, it might start with recent form and head-to-head, then pull goal and corner trends, then check referee and schedule density to flag fatigue + discipline risks. This creates a "Strategic Narrative" — connecting dots that would take a human analyst hours to find manually.

All the ES|QL source code for these tools is available in the attached GitHub repositor.

These tools are what make the agent feel truly intelligent — not just retrieving data, but synthesizing it into meaningful, actionable football intelligence — exactly what Elastic Agent Builder was designed for.

The Sample Frontend Apps (Review, News, Prediction, Chat)

To showcase the full power of Elite Sports Intelligence Engine, I built a clean, multi-page Next.js application with four distinct experiences — each powered by the same agent but using a specialized prompt and API route to deliver exactly what different users need.

All pages share a consistent cascading selector (country → league → home team → away team) for easy match input, and each calls a dedicated Next.js proxy route that crafts a tailored prompt and forwards it to the Agent Builder endpoint.

Match Review (Homepage – /)

The core experience. This page delivers a quick, high-level tactical snapshot of the fixture.

Route: /api/match-review

Prompt style: Focused on data-driven analysis (form, trends, key patterns).

Output: Structured Markdown with sections and tables — ideal for analysts or fans who want stats fast.
Match News / Preview (/news)

Journalistic storytelling mode.

Route: /api/news

Prompt style: Adopts a neutral "sports reporter" persona — generates engaging articles with headlines, storylines, key battles, and balanced conclusions.

Output: Readable, narrative preview — perfect for casual fans or content creators.
Match Stats & Predictions (/stats)

The "quant" deep-dive.

Route: /api/stats

Prompt style: Forces structured tables covering probabilities (1X2, over/under, BTTS, corners, cards, shots).

Output: Clean Markdown tables + a built-in risk disclaimer emphasizing that predictions are data-derived, not certainties.
Agent Chat (/chat)

Conversational exploration for power users.

Route: /api/chat

Prompt style: Open-ended — the agent can answer follow-ups, dive deeper into stats, compare players, or analyze league-wide trends.

Output: Full chat history with Markdown support — great for research or custom questions like "Why is Arsenal weak away?" or "Compare corner conversion rates."

These four experiences show how Elastic Agent Builder enables one backend agent to power multiple frontends with different personas and formats — all through prompt engineering and custom tool chaining. The agent stays consistent while adapting perfectly to each use case.

The "War Stories": Overcoming Technical Hurdles

Every innovative project comes with its own set of "Final Boss" challenges. For Elite Sports Intelligence Engine, the primary battle wasn't just building the intelligence—it was making that intelligence fast enough for the modern web.

1. The Timeout Battle: 19 Tools vs. 60 Seconds

When I first deployed the agent with its full suite of 19 ES|QL tools, I hit a wall: the 504 Gateway Timeout. Orchestrating nearly 20 specialized data-fetching tasks within a standard 60-second web window proved nearly impossible, as the agent’s "Chain of Thought" loop took longer than the browser was willing to wait.

To win this battle, I implemented a three-pronged strategy:

Infrastructure Tuning: I extended the execution limit in my Next.js route handler to maxDuration = 120.
Next.js Resilience: I implemented AbortControllers and AbortSignal.timeout(110000) to manage these long-running AI tasks without crashing the application.
Lean Tooling: I discovered that the agent was significantly faster when interacting directly within the Elasticsearch UI compared to the API. To bridge this speed gap, I strategically unchecked the least-critical tools, narrowing the engine's focus to the highest-impact metrics for the best performance.

2. Prompt Refinement: From "Ambiguity" to "Direct Command"

Originally, my system prompt was a massive, complex document that governed every possible behavior. However, I realized that a long prompt actually slowed the agent down by increasing the "Thinking" overhead.

I pivoted to a "Parallel-Reasoning" Instruction strategy:

Minimalist Agent Base: I simplified the default agent instructions in the Elastic UI to be as lean as possible.
Frontend Orchestration: I moved the heavy lifting—the specific formatting and tactical requirements—to the Next.js frontend prompt.
The "Delegation" Fix: Instead of trying to be everything at once, the agent is now simply told to: "Focus on the specific request from the frontend and choose only the most relevant tools for that specific task".

This refinement turned the agent from an "Over-thinker" into a "Decisive Analyst," dramatically reducing latency while maintaining the "Elite" quality of the insights.

Conclusion

Building Elite Sports Intelligence Engine started as a simple idea: what if we could remove human bias from football analysis and let pure data tell the story? What began as a frustration with manual spreadsheets turned into a fully autonomous, tool-driven agent that delivers deep, neutral insights in seconds.

Elastic Agent Builder was the perfect platform for this vision. It let me rapidly create 19 custom ES|QL tools, chain them intelligently, connect everything to my private Elasticsearch index, and expose powerful capabilities through clean APIs — exactly what the framework is designed to enable. The result is an engine that serves millions of football fans worldwide: casual viewers who want quick previews, analysts who need stats, fantasy players who crave probabilities, and journalists who want compelling narratives.

The project taught me that real intelligence comes from thoughtful tool design, safe data pipelines, and letting the agent reason over results — not from forcing an LLM to guess. It also proved that even a solo builder can create something impactful with the right tools.

This is just the beginning.

I’m excited to keep evolving Elite Sports Intelligence Engine — adding live match-day data, injury/weather context, player-level depth, and support for other sports — until it becomes the go-to unbiased intelligence layer for global football.

If you're reading this, thank you for following along.

The repo is open (link below) — fork it, extend it, or build your own version.

The future of sports analysis is data-driven, neutral, and agent-powered — and I’m just getting started.

Thanks again to the Elastic team for an incredible Agent Builder framework. To everyone who took the time to read through this journey: thank you for your curiosity.

Let’s keep pushing the boundaries.

GitHub Repository: GitHub Repository
Live Demo App: MatchScout AI Live Demo

Top comments (2)

👨‍💻Pierre-Henry ✨ • Mar 6 • Edited

Chaining that many tools and the agent still surpasses a human analyst... honestly, this is kinda wild to see 🔥 Thanks for sharing!

Temitope • Mar 15

That’s exactly where Agent Builder shines, once it starts reasoning across tools instead of just running them, the output jumps from raw stats to real or at least near-perfect intelligence.

Appreciate the feedback. ❤️