POTHURAJU JAYAKRISHNA YADAV for AWS Community Builders

Posted on Apr 14

The User Interface & Ground-Truth Testing

#ai #python #tutorial #ui

Part 5 of 5 | ← Part 4 | Complete Series

Streamlit Overview

Streamlit lets you build data apps in pure Python—no HTML, CSS, or JavaScript needed.

import streamlit as st
from datetime import datetime
import pandas as pd
import requests

# Page config
st.set_page_config(
    page_title="IPL AI Assistant",
    page_icon="🏏",
    layout="wide",
)

# Title
st.title("🏏 IPL AI Assistant")
st.subtitle("Predictions + Q&A Powered by ML")

# Tabs
tab1, tab2, tab3 = st.tabs(["💬 Chat", "🎯 Predict", "📊 Metrics"])

Tab 1: Chat Interface

with tab1:
    st.header("Ask Anything")

    # Initialize session state
    if "messages" not in st.session_state:
        st.session_state.messages = []

    # Display chat history
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # User input
    if user_input := st.chat_input("Ask about IPL..."):
        st.session_state.messages.append({"role": "user", "content": user_input})

        with st.chat_message("user"):
            st.markdown(user_input)

        # Call backend
        try:
            response = requests.post(
                "http://localhost:8000/chat",
                json={"message": user_input},
                timeout=5,
            )
            response.raise_for_status()

            result = response.json()
            assistant_message = result.get("message", "I couldn't understand that.")

            st.session_state.messages.append({
                "role": "assistant",
                "content": assistant_message,
            })

            with st.chat_message("assistant"):
                st.markdown(assistant_message)

        except Exception as e:
            st.error(f"❌ Backend error: {str(e)}")

Key Concepts:

Session State: st.session_state.messages persists across reruns
- When user submits a message, Streamlit reruns the entire script
- Session state preserves conversation history
- Without it: chat history disappears on each input
Chat Message: st.chat_message() renders messages with role-based styling
- "user" = right-aligned, blue background
- "assistant" = left-aligned, gray background
Chat Input: st.chat_input() provides a textbox with submission handling
- Returns None until user submits
- Automatically clears after submission

Tab 2: Prediction Interface

with tab2:
    st.header("Match Prediction Simulator")

    col1, col2 = st.columns(2)

    with col1:
        st.subheader("Teams")
        batting_team = st.selectbox(
            "Batting Team:",
            [
                "Mumbai Indians",
                "Chennai Super Kings",
                "Royal Challengers Bangalore",
                "Kolkata Knight Riders",
                # ... all 10 teams
            ],
        )
        bowling_team = st.selectbox(
            "Bowling Team:",
            options=["All teams except batting team"],
            index=0,
        )

    with col2:
        st.subheader("Venue")
        venue = st.text_input("Ground name:", "Wankhede")

    st.subheader("Pre-Match Form")
    col1, col2, col3, col4 = st.columns(4)

    with col1:
        h2h_rate = st.slider(
            "H2H Win Rate (Batting Team)",
            0.0, 1.0, 0.5, 0.05,
        )

    with col2:
        overall_rate = st.slider(
            "Overall Win Rate",
            0.0, 1.0, 0.5, 0.05,
        )

    with col3:
        venue_rate = st.slider(
            "Venue Win Rate",
            0.0, 1.0, 0.5, 0.05,
        )

    with col4:
        rolling_rate = st.slider(
            "Last 5 Matches Win Rate",
            0.0, 1.0, 0.5, 0.05,
        )

    st.subheader("Toss")
    col1, col2 = st.columns(2)

    with col1:
        toss_win = st.radio("Who won toss?", ["Batting Team", "Bowling Team"])
        toss_win = 1 if toss_win == "Batting Team" else 0

    with col2:
        toss_choice = st.radio("Toss choice?", ["Bat", "Field"])
        toss_choice = toss_choice.lower()

    # Predict button
    if st.button("🎯 Predict Match Outcome", use_container_width=True):
        try:
            response = requests.post(
                "http://localhost:8000/predict",
                json={
                    "batting_team": batting_team,
                    "bowling_team": bowling_team,
                    "venue": venue,
                    "h2h_rate": h2h_rate,
                    "overall_rate": overall_rate,
                    "venue_rate": venue_rate,
                    "rolling_rate": rolling_rate,
                    "toss_win": toss_win,
                    "toss_choice": toss_choice,
                },
                timeout=5,
            )
            response.raise_for_status()

            result = response.json()
            winner = result["winner"]
            confidence = result["confidence"]

            st.success(
                f"### 🏆 {winner} wins!\n"
                f"**Confidence:** {confidence:.1%}"
            )

            # Show prediction breakdown
            st.info(
                f"**Model:** {result['model']}\n\n"
                f"**Reasoning:**\n"
                f"- H2H: {h2h_rate:.0%}\n"
                f"- Form: {rolling_rate:.0%}\n"
                f"- Venue: {venue_rate:.0%}\n"
            )

        except Exception as e:
            st.error(f"❌ Prediction failed: {str(e)}")

Key UI Patterns:

st.selectbox() — Dropdown selector
st.slider() — Range input (0.0-1.0)
st.radio() — Single-choice radio buttons
st.columns() — Grid layout (col1, col2, etc.)
st.button() — Form submission
st.success/error/info() — Colored alerts

Tab 3: Metrics & Transparency

with tab3:
    st.header("Model Performance")

    # Load metrics
    import json
    with open("models/metrics.json") as f:
        metrics = json.load(f)

    col1, col2, col3 = st.columns(3)
    col1.metric("Test Accuracy", f"{metrics['accuracy']:.1%}")
    col2.metric("Precision", f"{metrics['precision']:.1%}")
    col3.metric("Recall", f"{metrics['recall']:.1%}")

    st.subheader("Confusion Matrix")
    st.image("models/confusion_matrix.png", use_column_width=True)

    st.subheader("Feature Importance")
    importance_df = pd.DataFrame({
        "Feature": ["h2h_rate", "rolling_rate", "venue_rate", ...],
        "Importance": [0.32, 0.28, 0.18, ...],
    }).sort_values("Importance", ascending=False)

    st.bar_chart(importance_df.set_index("Feature"))

    st.subheader("Q&A Engine")
    st.info(
        f"**Total Q&A Pairs:** 42,523\n\n"
        f"**Vocabulary Size:** 18,394\n\n"
        f"**Match Strategy:** TF-IDF + Cosine Similarity (threshold: 0.15)\n\n"
        f"**Coverage:** {(42523 / 50000 * 100):.1f}% of expected cricket topics"
    )

Testing: The Foundation of Trust

Good tests = confidence in deployment.

Test Structure

# tests/test_qa.py

import pytest
import pandas as pd
from src.build_qa_model import answer_question
from joblib import load

# Load test data
test_df = pd.read_csv("datasets/ipl_2008_2024_complete.csv")
qa_model = load("models/qa_model.joblib")

# Extract Q&A components
tfidf = qa_model["tfidf"]
Q_matrix = qa_model["Q_matrix"]
answers = qa_model["answers"]

Test 1: Specific Match Facts

def test_match_lookup():
    """Can we answer specific match questions?"""
    questions = [
        "Who won the match on 2024-04-01 between MI and RR?",
        "How many runs were scored by MI in 2024-04-01?",
        "What was the result of MI vs RR on 2024-04-01?",
    ]

    for question in questions:
        answer, score = answer_question(
            question, tfidf, Q_matrix, answers, threshold=0.15
        )

        assert answer is not None, f"Failed on: {question}"
        assert len(answer) > 10, f"Answer too short: {answer}"
        assert score > 0.15, f"Confidence too low: {score}"

Why data-driven?

Not hardcoded: "answer == 'Mumbai Indians wins'"
CSV-based: Pulls real facts from dataset
Robust: Works even if answer phrasing changes

Test 2: Aggregate Statistics

def test_most_wins():
    """Can we retrieve aggregate stats?"""
    questions = [
        "Which team has won most matches?",
        "Who has most IPL titles?",
        "Team with highest win percentage?",
    ]

    for question in questions:
        answer, score = answer_question(
            question, tfidf, Q_matrix, answers, threshold=0.15
        )

        # Verify answer is a valid team name
        assert answer is not None
        valid_teams = ["Mumbai Indians", "CSK", "RCB", ...]
        assert any(team in answer for team in valid_teams)

Test 3: Head-to-Head

def test_head_to_head():
    """Can we answer H2H questions?"""
    questions = [
        "Head-to-head record between MI and CSK?",
        "Does MI have winning record vs KKR?",
        "Who dominates MI vs RR?",
    ]

    for question in questions:
        answer, score = answer_question(
            question, tfidf, Q_matrix, answers, threshold=0.15
        )

        assert answer is not None
        # H2H answers contain numbers (win counts)
        assert any(char.isdigit() for char in answer)

Test 4: Threshold Behavior

def test_threshold_protects_low_confidence():
    """Low-confidence matches are rejected."""
    nonsense = "xyzabc qwerty asdfgh"  # Gibberish

    answer, score = answer_question(
        nonsense, tfidf, Q_matrix, answers, threshold=0.15
    )

    # Model shouldn't hallucinate
    assert answer is None, f"Got answer for nonsense: {answer}"
    assert score < 0.15

Test 5: ML Model Predictions

def test_model_predictions():
    """Can we predict match winners?"""
    from src.train import normalize_teams, engineer_features
    from joblib import load

    ml_model = load("models/model.joblib")
    pipeline = ml_model["pipeline"]

    # Create a test case
    test_row = test_df.iloc[0].copy()  # Use real match
    test_row["date"] = "2023-05-01"  # Test on recent data

    # Engineer features
    features_df = engineer_features(
        test_df[test_df["date"] < "2023-01-01"],  # Historical
        test_row,
    )

    # Predict
    prediction = pipeline.predict(features_df)
    prob = pipeline.predict_proba(features_df)

    assert prediction[0] in [0, 1]  # Binary classification
    assert 0 <= max(prob[0]) <= 1  # Valid probability
    assert abs(sum(prob[0]) - 1.0) < 0.01  # Probabilities sum to 1

Test 6: Feature Sanity

def test_feature_ranges():
    """Are features in valid ranges?"""
    from src.features import engineer_features

    # Get one match
    test_match = test_df.iloc[0]

    # Engineer
    features = engineer_features(test_df, test_match)

    # Check rates (should be 0-1)
    assert 0 <= features["h2h_rate"].values[0] <= 1
    assert 0 <= features["overall_rate"].values[0] <= 1
    assert 0 <= features["venue_rate"].values[0] <= 1
    assert 0 <= features["rolling_rate"].values[0] <= 1

    # Check binary fields
    assert features["toss_win"].values[0] in [0, 1]

Test 7: No Data Leakage

def test_future_data_not_used():
    """Ensure no future data affects past predictions."""
    from src.features import engineer_features

    # Engineer a 2023 match using only 2022 history
    before_date = "2023-01-01"
    hist_data = test_df[test_df["date"] < before_date]

    test_match = test_df[
        (test_df["date"] >= before_date) & 
        (test_df["date"] < "2023-02-01")
    ].iloc[0]

    features = engineer_features(hist_data, test_match)

    # Features should only use hist_data, not test_match
    # (This is enforced in engine_features with before_date guards)
    assert features is not None

    # Verify: no 2023 data leaked into 2022 rates
    assert all(hist_data["date"] < before_date)

Running Tests

# Install test dependencies
pip install pytest

# Run all tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# Run specific test
pytest tests/test_qa.py::test_match_lookup -v

# Coverage report
pytest tests/ --cov=src --cov-report=html

Deployment

Option 1: Streamlit Cloud

# Push to GitHub
git push origin main

# Link in Streamlit Cloud (streamlit.io/cloud)
# - Select repository
# - Select app.py
# - Auto-deploys on push

Live in 2 minutes, updates automatically.

Option 2: Docker

FROM python:3.11

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8501

CMD ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501"]

docker build -t ipl-app .
docker run -p 8501:8501 ipl-app

Visit http://localhost:8501

Option 3: Heroku / Railway

# Deploy with one command
heroku create ipl-ai-assistant
git push heroku main

App runs at ipl-ai-assistant.herokuapp.com

Performance Monitoring

# Log predictions to dataframe
import logging
from datetime import datetime

logging.basicConfig(filename="predictions.log", level=logging.INFO)

@st.cache_data
def get_prediction_logs():
    return pd.read_csv("predictions.log")

# Dashboard
st.line_chart(
    get_prediction_logs()
    .groupby("hour")["confidence"]
    .mean()
)

Track:

Average confidence over time
Common queries
Backend latency

Common Issues & Solutions

Issue	Solution
"Connection refused"	Backend not running on localhost:8000
Chat history disappears	Use st.session_state, not regular variables
Predictions slow	Enable model caching, use lazy loading
Tests fail on new data	Read from CSV, not hardcoded values
Threshold too strict	Lower from 0.15 to 0.10 for more results

Summary: The Complete System

Component	Purpose	Technology
Feature Engineering	Calculate pre-match metrics	pandas, numpy
ML Model	Predict match winners	scikit-learn
Q&A Engine	Answer cricket questions	TF-IDF, cosine similarity
FastAPI Backend	Intelligent routing, lazy loading	FastAPI, uvicorn
Streamlit Frontend	Chat, prediction, metrics	Streamlit
Testing	Verify correctness	pytest, CSV-based assertions

You now have:

✅ Production-ready predictions (61.8% accuracy)

✅ Intelligent Q&A with 42K learning pairs

✅ Low-latency API (<20ms per request)

✅ Smooth UI with session persistence

✅ 22 tests ensuring reliability

✅ Multiple deployment options

What's Next?

Deploy this and message me your results! 🚀

Repository: https://github.com/jayakrishnayadav24/ipl-ai-assistant

Series Complete 🎉

← Part 4: Backend Routing | ← Return to Series

Built with 💚 for cricket fans. Questions? DM me!

DEV Community