DEV Community

The User Interface & Ground-Truth Testing

Part 5 of 5 | ← Part 4 | Complete Series

Streamlit Overview

Streamlit lets you build data apps in pure Python—no HTML, CSS, or JavaScript needed.


import streamlit as st
from datetime import datetime
import pandas as pd
import requests

# Page config
st.set_page_config(
    page_title="IPL AI Assistant",
    page_icon="🏏",
    layout="wide",
)

# Title
st.title("🏏 IPL AI Assistant")
st.subtitle("Predictions + Q&A Powered by ML")

# Tabs
tab1, tab2, tab3 = st.tabs(["💬 Chat", "🎯 Predict", "📊 Metrics"])
Enter fullscreen mode Exit fullscreen mode

Tab 1: Chat Interface

with tab1:
    st.header("Ask Anything")

    # Initialize session state
    if "messages" not in st.session_state:
        st.session_state.messages = []

    # Display chat history
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # User input
    if user_input := st.chat_input("Ask about IPL..."):
        st.session_state.messages.append({"role": "user", "content": user_input})

        with st.chat_message("user"):
            st.markdown(user_input)

        # Call backend
        try:
            response = requests.post(
                "http://localhost:8000/chat",
                json={"message": user_input},
                timeout=5,
            )
            response.raise_for_status()

            result = response.json()
            assistant_message = result.get("message", "I couldn't understand that.")

            st.session_state.messages.append({
                "role": "assistant",
                "content": assistant_message,
            })

            with st.chat_message("assistant"):
                st.markdown(assistant_message)

        except Exception as e:
            st.error(f"❌ Backend error: {str(e)}")
Enter fullscreen mode Exit fullscreen mode

Key Concepts:

  1. Session State: st.session_state.messages persists across reruns

    • When user submits a message, Streamlit reruns the entire script
    • Session state preserves conversation history
    • Without it: chat history disappears on each input
  2. Chat Message: st.chat_message() renders messages with role-based styling

    • "user" = right-aligned, blue background
    • "assistant" = left-aligned, gray background
  3. Chat Input: st.chat_input() provides a textbox with submission handling

    • Returns None until user submits
    • Automatically clears after submission

Tab 2: Prediction Interface

with tab2:
    st.header("Match Prediction Simulator")

    col1, col2 = st.columns(2)

    with col1:
        st.subheader("Teams")
        batting_team = st.selectbox(
            "Batting Team:",
            [
                "Mumbai Indians",
                "Chennai Super Kings",
                "Royal Challengers Bangalore",
                "Kolkata Knight Riders",
                # ... all 10 teams
            ],
        )
        bowling_team = st.selectbox(
            "Bowling Team:",
            options=["All teams except batting team"],
            index=0,
        )

    with col2:
        st.subheader("Venue")
        venue = st.text_input("Ground name:", "Wankhede")

    st.subheader("Pre-Match Form")
    col1, col2, col3, col4 = st.columns(4)

    with col1:
        h2h_rate = st.slider(
            "H2H Win Rate (Batting Team)",
            0.0, 1.0, 0.5, 0.05,
        )

    with col2:
        overall_rate = st.slider(
            "Overall Win Rate",
            0.0, 1.0, 0.5, 0.05,
        )

    with col3:
        venue_rate = st.slider(
            "Venue Win Rate",
            0.0, 1.0, 0.5, 0.05,
        )

    with col4:
        rolling_rate = st.slider(
            "Last 5 Matches Win Rate",
            0.0, 1.0, 0.5, 0.05,
        )

    st.subheader("Toss")
    col1, col2 = st.columns(2)

    with col1:
        toss_win = st.radio("Who won toss?", ["Batting Team", "Bowling Team"])
        toss_win = 1 if toss_win == "Batting Team" else 0

    with col2:
        toss_choice = st.radio("Toss choice?", ["Bat", "Field"])
        toss_choice = toss_choice.lower()

    # Predict button
    if st.button("🎯 Predict Match Outcome", use_container_width=True):
        try:
            response = requests.post(
                "http://localhost:8000/predict",
                json={
                    "batting_team": batting_team,
                    "bowling_team": bowling_team,
                    "venue": venue,
                    "h2h_rate": h2h_rate,
                    "overall_rate": overall_rate,
                    "venue_rate": venue_rate,
                    "rolling_rate": rolling_rate,
                    "toss_win": toss_win,
                    "toss_choice": toss_choice,
                },
                timeout=5,
            )
            response.raise_for_status()

            result = response.json()
            winner = result["winner"]
            confidence = result["confidence"]

            st.success(
                f"### 🏆 {winner} wins!\n"
                f"**Confidence:** {confidence:.1%}"
            )

            # Show prediction breakdown
            st.info(
                f"**Model:** {result['model']}\n\n"
                f"**Reasoning:**\n"
                f"- H2H: {h2h_rate:.0%}\n"
                f"- Form: {rolling_rate:.0%}\n"
                f"- Venue: {venue_rate:.0%}\n"
            )

        except Exception as e:
            st.error(f"❌ Prediction failed: {str(e)}")
Enter fullscreen mode Exit fullscreen mode

Key UI Patterns:

  • st.selectbox() — Dropdown selector
  • st.slider() — Range input (0.0-1.0)
  • st.radio() — Single-choice radio buttons
  • st.columns() — Grid layout (col1, col2, etc.)
  • st.button() — Form submission
  • st.success/error/info() — Colored alerts

Tab 3: Metrics & Transparency

with tab3:
    st.header("Model Performance")

    # Load metrics
    import json
    with open("models/metrics.json") as f:
        metrics = json.load(f)

    col1, col2, col3 = st.columns(3)
    col1.metric("Test Accuracy", f"{metrics['accuracy']:.1%}")
    col2.metric("Precision", f"{metrics['precision']:.1%}")
    col3.metric("Recall", f"{metrics['recall']:.1%}")

    st.subheader("Confusion Matrix")
    st.image("models/confusion_matrix.png", use_column_width=True)

    st.subheader("Feature Importance")
    importance_df = pd.DataFrame({
        "Feature": ["h2h_rate", "rolling_rate", "venue_rate", ...],
        "Importance": [0.32, 0.28, 0.18, ...],
    }).sort_values("Importance", ascending=False)

    st.bar_chart(importance_df.set_index("Feature"))

    st.subheader("Q&A Engine")
    st.info(
        f"**Total Q&A Pairs:** 42,523\n\n"
        f"**Vocabulary Size:** 18,394\n\n"
        f"**Match Strategy:** TF-IDF + Cosine Similarity (threshold: 0.15)\n\n"
        f"**Coverage:** {(42523 / 50000 * 100):.1f}% of expected cricket topics"
    )
Enter fullscreen mode Exit fullscreen mode

Testing: The Foundation of Trust

Good tests = confidence in deployment.

Test Structure

# tests/test_qa.py

import pytest
import pandas as pd
from src.build_qa_model import answer_question
from joblib import load

# Load test data
test_df = pd.read_csv("datasets/ipl_2008_2024_complete.csv")
qa_model = load("models/qa_model.joblib")

# Extract Q&A components
tfidf = qa_model["tfidf"]
Q_matrix = qa_model["Q_matrix"]
answers = qa_model["answers"]
Enter fullscreen mode Exit fullscreen mode

Test 1: Specific Match Facts

def test_match_lookup():
    """Can we answer specific match questions?"""
    questions = [
        "Who won the match on 2024-04-01 between MI and RR?",
        "How many runs were scored by MI in 2024-04-01?",
        "What was the result of MI vs RR on 2024-04-01?",
    ]

    for question in questions:
        answer, score = answer_question(
            question, tfidf, Q_matrix, answers, threshold=0.15
        )

        assert answer is not None, f"Failed on: {question}"
        assert len(answer) > 10, f"Answer too short: {answer}"
        assert score > 0.15, f"Confidence too low: {score}"
Enter fullscreen mode Exit fullscreen mode

Why data-driven?

  • Not hardcoded: "answer == 'Mumbai Indians wins'"
  • CSV-based: Pulls real facts from dataset
  • Robust: Works even if answer phrasing changes

Test 2: Aggregate Statistics

def test_most_wins():
    """Can we retrieve aggregate stats?"""
    questions = [
        "Which team has won most matches?",
        "Who has most IPL titles?",
        "Team with highest win percentage?",
    ]

    for question in questions:
        answer, score = answer_question(
            question, tfidf, Q_matrix, answers, threshold=0.15
        )

        # Verify answer is a valid team name
        assert answer is not None
        valid_teams = ["Mumbai Indians", "CSK", "RCB", ...]
        assert any(team in answer for team in valid_teams)
Enter fullscreen mode Exit fullscreen mode

Test 3: Head-to-Head

def test_head_to_head():
    """Can we answer H2H questions?"""
    questions = [
        "Head-to-head record between MI and CSK?",
        "Does MI have winning record vs KKR?",
        "Who dominates MI vs RR?",
    ]

    for question in questions:
        answer, score = answer_question(
            question, tfidf, Q_matrix, answers, threshold=0.15
        )

        assert answer is not None
        # H2H answers contain numbers (win counts)
        assert any(char.isdigit() for char in answer)
Enter fullscreen mode Exit fullscreen mode

Test 4: Threshold Behavior

def test_threshold_protects_low_confidence():
    """Low-confidence matches are rejected."""
    nonsense = "xyzabc qwerty asdfgh"  # Gibberish

    answer, score = answer_question(
        nonsense, tfidf, Q_matrix, answers, threshold=0.15
    )

    # Model shouldn't hallucinate
    assert answer is None, f"Got answer for nonsense: {answer}"
    assert score < 0.15
Enter fullscreen mode Exit fullscreen mode

Test 5: ML Model Predictions

def test_model_predictions():
    """Can we predict match winners?"""
    from src.train import normalize_teams, engineer_features
    from joblib import load

    ml_model = load("models/model.joblib")
    pipeline = ml_model["pipeline"]

    # Create a test case
    test_row = test_df.iloc[0].copy()  # Use real match
    test_row["date"] = "2023-05-01"  # Test on recent data

    # Engineer features
    features_df = engineer_features(
        test_df[test_df["date"] < "2023-01-01"],  # Historical
        test_row,
    )

    # Predict
    prediction = pipeline.predict(features_df)
    prob = pipeline.predict_proba(features_df)

    assert prediction[0] in [0, 1]  # Binary classification
    assert 0 <= max(prob[0]) <= 1  # Valid probability
    assert abs(sum(prob[0]) - 1.0) < 0.01  # Probabilities sum to 1
Enter fullscreen mode Exit fullscreen mode

Test 6: Feature Sanity

def test_feature_ranges():
    """Are features in valid ranges?"""
    from src.features import engineer_features

    # Get one match
    test_match = test_df.iloc[0]

    # Engineer
    features = engineer_features(test_df, test_match)

    # Check rates (should be 0-1)
    assert 0 <= features["h2h_rate"].values[0] <= 1
    assert 0 <= features["overall_rate"].values[0] <= 1
    assert 0 <= features["venue_rate"].values[0] <= 1
    assert 0 <= features["rolling_rate"].values[0] <= 1

    # Check binary fields
    assert features["toss_win"].values[0] in [0, 1]
Enter fullscreen mode Exit fullscreen mode

Test 7: No Data Leakage

def test_future_data_not_used():
    """Ensure no future data affects past predictions."""
    from src.features import engineer_features

    # Engineer a 2023 match using only 2022 history
    before_date = "2023-01-01"
    hist_data = test_df[test_df["date"] < before_date]

    test_match = test_df[
        (test_df["date"] >= before_date) & 
        (test_df["date"] < "2023-02-01")
    ].iloc[0]

    features = engineer_features(hist_data, test_match)

    # Features should only use hist_data, not test_match
    # (This is enforced in engine_features with before_date guards)
    assert features is not None

    # Verify: no 2023 data leaked into 2022 rates
    assert all(hist_data["date"] < before_date)
Enter fullscreen mode Exit fullscreen mode

Running Tests

# Install test dependencies
pip install pytest

# Run all tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# Run specific test
pytest tests/test_qa.py::test_match_lookup -v

# Coverage report
pytest tests/ --cov=src --cov-report=html
Enter fullscreen mode Exit fullscreen mode

Deployment

Option 1: Streamlit Cloud

# Push to GitHub
git push origin main

# Link in Streamlit Cloud (streamlit.io/cloud)
# - Select repository
# - Select app.py
# - Auto-deploys on push
Enter fullscreen mode Exit fullscreen mode

Live in 2 minutes, updates automatically.


Option 2: Docker

FROM python:3.11

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501

CMD ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501"]
Enter fullscreen mode Exit fullscreen mode
docker build -t ipl-app .
docker run -p 8501:8501 ipl-app
Enter fullscreen mode Exit fullscreen mode

Visit http://localhost:8501


Option 3: Heroku / Railway

# Deploy with one command
heroku create ipl-ai-assistant
git push heroku main
Enter fullscreen mode Exit fullscreen mode

App runs at ipl-ai-assistant.herokuapp.com


Performance Monitoring

# Log predictions to dataframe
import logging
from datetime import datetime

logging.basicConfig(filename="predictions.log", level=logging.INFO)

@st.cache_data
def get_prediction_logs():
    return pd.read_csv("predictions.log")

# Dashboard
st.line_chart(
    get_prediction_logs()
    .groupby("hour")["confidence"]
    .mean()
)
Enter fullscreen mode Exit fullscreen mode

Track:

  • Average confidence over time
  • Common queries
  • Backend latency

Common Issues & Solutions

Issue Solution
"Connection refused" Backend not running on localhost:8000
Chat history disappears Use st.session_state, not regular variables
Predictions slow Enable model caching, use lazy loading
Tests fail on new data Read from CSV, not hardcoded values
Threshold too strict Lower from 0.15 to 0.10 for more results

Summary: The Complete System

Component Purpose Technology
Feature Engineering Calculate pre-match metrics pandas, numpy
ML Model Predict match winners scikit-learn
Q&A Engine Answer cricket questions TF-IDF, cosine similarity
FastAPI Backend Intelligent routing, lazy loading FastAPI, uvicorn
Streamlit Frontend Chat, prediction, metrics Streamlit
Testing Verify correctness pytest, CSV-based assertions

You now have:

✅ Production-ready predictions (61.8% accuracy)

✅ Intelligent Q&A with 42K learning pairs

✅ Low-latency API (<20ms per request)

✅ Smooth UI with session persistence

✅ 22 tests ensuring reliability

✅ Multiple deployment options


What's Next?

Deploy this and message me your results! 🚀

Repository: https://github.com/jayakrishnayadav24/ipl-ai-assistant

Series Complete 🎉

← Part 4: Backend Routing | ← Return to Series

Built with 💚 for cricket fans. Questions? DM me!

Top comments (0)