Lucas

Posted on Jun 20

How I Built a Natural Language F1 Telemetry Analyzer with FastF1 and Claude

#python #f1 #ai #streamlit

Ask "Did McLaren's undercut on Ferrari work?" and get a real answer — backed by actual lap data.

The Problem

Formula 1 analysis is data-rich but tool-poor for non-engineers. A journalist covering a race weekend has to juggle timing sheets, telemetry exports, strategy briefings, and their own notes — all in different formats, none of them speaking to each other.

I wanted to build something that let a journalist (or any F1 fan with technical curiosity) ask questions in plain Spanish and get answers grounded in real telemetry data — not hallucinated summaries, not Wikipedia facts, but actual FastF1 lap data from that specific race weekend.

The result is F1 Analyst Pro: a Streamlit chat app that ingests FastF1 session data into Supabase, builds structured context from SQL queries, and sends it to Claude Sonnet for natural language analysis.

Live app: f1-analyst.streamlit.app
Repo: github.com/luc45hn/f1-analyst-pro

The Stack

FastF1 — official F1 telemetry data (laps, car data, weather, race control messages)
Claude Sonnet (claude-sonnet-4-6) — the LLM that interprets queries and generates analysis
Supabase + PostgreSQL — stores ingested session data; queried at runtime to build context
Streamlit — the frontend, deployed on Streamlit Cloud
Plotly — interactive charts for telemetry traces, tyre degradation, pace analysis

Architecture: Intent-Based RAG

This isn't a standard vector search RAG. There are no embeddings, no cosine similarity, no document chunks. Instead, the system uses intent detection + structured SQL queries to build context — which works much better for structured tabular data like lap times.

Here's the flow:

User query
    ↓
Intent detection (keyword matching + regex)
    ↓
SQL queries to Supabase (laps, results, stints, incidents)
    ↓
Structured context string
    ↓
Claude API call
    ↓
Natural language response + optional Plotly chart

Intent Detection

The agent detects what the user is asking about and fetches only the relevant data:

prompt_lower = unicodedata.normalize("NFD", prompt.lower())  # accent-insensitive

wants_qualy    = any(w in prompt_lower for w in ["clasificacion", "qualifying", "q1", "q2", "q3", "pole"])
wants_race     = any(w in prompt_lower for w in ["carrera", "race", "resultado", "vuelta rapida"])
wants_telemetry = any(w in prompt_lower for w in ["telemetria", "aceleracion", "frenada", "velocidad"])
wants_undercut  = any(w in prompt_lower for w in ["undercut", "overcut", "estrategia de pit", "parada"])
wants_practice  = any(w in prompt_lower for w in ["entrenamiento", "fp1", "fp2", "fp3", "long run"])

load_all = not (wants_qualy or wants_race or wants_telemetry or wants_undercut or wants_practice)

Building Context

Each intent triggers specific SQL queries. For a race query, the agent fetches stint summaries, pit stop analysis, key moments, and race incidents:

if wants_race or load_all:
    race_id = self.db.get_session_id(year, gp_name, "R")
    if race_id:
        stint_data = self.db.get_stint_summary(race_id).to_dict("records")
        static_context += "--- RACE STINT SUMMARY ---\n" + str(stint_data) + "\n\n"

        pit_df = self.db.get_pit_stop_analysis(race_id)
        if not pit_df.empty:
            static_context += "--- UNDERCUT/OVERCUT ANALYSIS ---\n" + pit_df.to_string(index=False) + "\n\n"

        key_moments = self.db.get_key_moments(race_id)
        if not key_moments.empty:
            static_context += "--- KEY MOMENTS ---\n" + key_moments.to_string(index=False) + "\n\n"

This context is then passed to Claude as part of the user message — not as a system prompt, but injected into the conversation turn so the model can reason over it.

Data Ingestion with FastF1

FastF1 is the backbone. Loading a session is straightforward:

import fastf1

fastf1.Cache.enable_cache("cache/")
session = fastf1.get_session(2026, "Barcelona Grand Prix", "R")
session.load(laps=True, telemetry=True, weather=True, messages=True)

# Access lap data as a DataFrame
laps = session.laps
print(laps[["Driver", "LapTime", "Compound", "TyreLife", "Position"]].head())

Each session gets ingested into Supabase with extended fields:

lap_record = {
    "driver":           lap["Driver"],
    "lap_number":       int(lap["LapNumber"]),
    "lap_time":         lap_time_seconds,
    "compound":         lap["Compound"],
    "tyre_life":        int(lap["TyreLife"]),
    "stint":            int(lap["Stint"]),
    "position":         int(lap["Position"]) if pd.notna(lap.get("Position")) else None,
    "is_personal_best": bool(lap["IsPersonalBest"]) if pd.notna(lap.get("IsPersonalBest")) else None,
    "speed_fl":         float(lap["SpeedFL"]) if pd.notna(lap.get("SpeedFL")) else None,
    "deleted":          bool(lap["Deleted"]) if pd.notna(lap.get("Deleted")) else None,
    "deleted_reason":   str(lap["DeletedReason"]) if pd.notna(lap.get("DeletedReason")) else None,
}

Having position per lap is what makes undercut/overcut analysis possible — you can track exactly when a driver gained or lost positions relative to their rivals.

Telemetry Traces

The most visually impressive feature. When the user asks for telemetry, the app generates a Plotly chart before calling Claude — then tells the model the chart exists so it can reference it in the analysis.

# chart_builder.py
def plot_telemetry_trace(laps_data, gp_name, year, drivers, session_type="Q", qualifying_segment=None):
    fig = make_subplots(rows=4, cols=1, shared_xaxes=True,
                        subplot_titles=["Speed (km/h)", "Throttle (%)", "Brake", "Gear"])

    for driver in drivers:
        # Filter by qualifying segment (Q1/Q2/Q3) if specified
        drv_laps = laps_df[laps_df["Driver"] == driver]
        if qualifying_segment:
            stint_map = {"Q1": 1, "Q2": 2, "Q3": 3}
            seg_laps = drv_laps[drv_laps["Stint"] == stint_map[qualifying_segment]]
            drv_laps = seg_laps if not seg_laps.empty else drv_laps

        fastest = drv_laps.loc[drv_laps["LapTime"].idxmin()]
        car_data = fastest.get_car_data().add_distance()

        fig.add_trace(go.Scatter(x=car_data["Distance"], y=car_data["Speed"], name=f"{driver} — Speed"), row=1, col=1)
        fig.add_trace(go.Scatter(x=car_data["Distance"], y=car_data["Throttle"], name=f"{driver} — Throttle"), row=2, col=1)
        # ... brake and gear traces

    return fig

The key insight: the model doesn't generate the chart — it's generated from real data and the model is told "a telemetry chart comparing COL vs GAS in Q2 has been generated." This avoids hallucinated matplotlib code appearing in the chat.

Undercut/Overcut Analysis

One of the features I'm most proud of. Given lap-by-lap position data, the system detects pit stop interactions between nearby drivers and assigns a verdict:

def get_pit_stop_analysis(self, session_id: int, position_window=3, lap_window=5):
    # For each pit stop, find rivals within ±3 positions
    # who also pitted within ±5 laps
    # Compare positions 3 laps after the last of the two stops
    # Verdict: UNDERCUT/OVERCUT EXITOSO/FALLIDO or PARADA NEUTRAL

The Monaco 2026 race produced this analysis automatically:

"HAM's undercut over HAD at lap 28 was the most decisive move of the race. Delta: +6.0s. Ferrari entered first and exited well ahead — textbook undercut execution."

Key Moments Detection

The system automatically flags anomalies in race data without being asked:

def get_key_moments(self, session_id: int) -> pd.DataFrame:
    # TRACK_LIMITS: laps where deleted=True
    # PACE_ANOMALY: lap_time > driver's median * 1.15 on green track
    # POSITION_DROP: position worsens 3+ places in one lap without a pit stop
    # PERSONAL_BEST: top 5 fastest personal bests on green track

This is what allowed the app to proactively mention in a race summary: "RUS lost 11 positions at lap 73 — the most significant single-lap position drop of the race."

Race Incidents from Race Control Messages

FastF1 exposes race_control_messages for Race and Sprint sessions. The system ingests these and cross-references them with on-track vs. official position discrepancies:

# Detect penalties that changed final standings
incidents = db.get_race_incidents(race_id)
discrepancies = db.get_position_discrepancies(race_id)

# Cross-reference: if a driver has both an incident AND a position discrepancy,
# the agent explains why their on-track finish differs from the official result

This caught the Barcelona 2026 case where Colapinto finished P8 on track but was penalized to P10 post-race for a yellow flag infringement — the agent explained it correctly without any manual notes.

What It Looks Like

Cost Control

Each query costs roughly $0.02-0.04 in Claude API usage. To prevent runaway costs, there's a per-user daily limit stored in Supabase:

DAILY_COST_LIMIT_USD = 3.00

def check_daily_cost(self, user_email: str) -> float:
    today = date.today()
    result = self.supabase.table("queries") \
        .select("cost") \
        .eq("user_email", user_email) \
        .eq("date", str(today)) \
        .execute()
    return sum(r["cost"] for r in result.data)

If the limit is reached, the agent blocks the query and shows a friendly message — no silent failures.

What's Next

NASCAR equivalent for Argentine driver Baltazar Leguizamón (O'Reilly Auto Parts Series 2026) — different data source, same RAG pattern
Multi-season historical comparison
Race simulation pace analysis (FP2 long run methodology — still being calibrated)

Try It

The app is live and free to try (within daily usage limits):

🏎️ f1-analyst.streamlit.app
📦 github.com/luc45hn/f1-analyst-pro

Built with FastF1, Claude Sonnet, Supabase, Streamlit, and too many late nights watching qualifying sessions.

DEV Community