In this tutorial, we'll walk through the process of creating an interactive WNBA (Women's National Basketball Association) analytics dashboard. This powerful tool combines data visualization, AI-driven insights, and a chatbot interface to provide a comprehensive view of WNBA player statistics and team information.
Project Overview
This WNBA analytics dashboard offers the following features:
- Player statistics visualization
- Team comparisons
- Interactive map showing WNBA team locations
- AI-powered chatbot for WNBA-related queries
- Data filtering and sorting capabilities
- Responsive design for various screen sizes
We'll be using the following technologies:
- Python
- Streamlit for the web application framework
- LangChain for natural language processing
- Cloudflare Workers AI for machine learning capabilities
- Pandas for data manipulation
- Plotly for interactive charts
- Folium for map visualizations
Step 1: Setting Up the Environment
First, let's set up our development environment:
- Create a new Python virtual environment
python3 -m venv venv
source venv/bin/activate
- Include the required packages at the top of your Python file. From the GitHub repo, you can download the requirements.txt file and run
pip install -r requirements.txt
to install them.
import base64
from dotenv import load_dotenv
import json
import os
import requests
import time
import webcolors
import streamlit as st
import folium
from streamlit_folium import st_folium
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut, GeocoderServiceError
from langchain.memory import ConversationBufferMemory
from langchain_community.llms.cloudflare_workersai import CloudflareWorkersAI
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.runnables import RunnablePassthrough
import numpy as np
import pandas as pd
from pathlib import Path
import plotly.graph_objects as go
- Set up a Cloudflare account and obtain the necessary credentials for Workers AI. When you login to the dashboard, your account ID is the string of characters that follow https://dash.cloudflare.com/ To get a Workers AI auth token, click on AI on the lefthand sidebar followed by the blue Use Rest API button. Then, click Create a Workers AI API Token.
Add them to a .env file and reference them by adding these lines beneath the import statements.
load_dotenv()
# Cloudflare Workers AI setup
ACCOUNT_ID = os.getenv('CF_ACCOUNT_ID')
AUTH_TOKEN = os.getenv('CF_AUTH_TOKEN')
Configuring the Streamlit Page
Let's set up the basic Streamlit page configuration and add some custom CSS for styling:
st.set_page_config(page_title="WNBA Player Analytics Dashboard, AI Insights, && AI Assistant", page_icon="π", layout="wide")
# Custom CSS (truncated for brevity)
st.markdown("""
<style>
.hover-link {
color: #1E90FF;
text-decoration: none;
transition: color 0.3s ease;
}
.hover-link:hover {
color: #FF4500;
text-decoration: underline;
}
/* ... more custom CSS -> see https://github.com/elizabethsiegle/wnba-analytics-dash-ai-insights ... */
</style>
""", unsafe_allow_html=True)
Data Collection and Preprocessing
Create a function to fetch and preprocess WNBA player data:
def fetch_player_data(season):
url = f"https://www.basketball-reference.com/wnba/years/{season}_per_game.html"
dataframes = pd.read_html(url, header=0)
df = dataframes[0]
df = df[df.G != 'G'].fillna(0) # Remove header rows and fill NaNs
df = df.drop(['G'], axis=1)
# Convert percentage columns to float
percentage_columns = ['FG%', '3P%', '2P%', 'eFG%', 'FT%']
for col in percentage_columns:
if col in df.columns:
df[col] = pd.to_numeric(df[col].astype(str).str.rstrip('%'), errors='coerce') / 100
# Convert other numeric columns to float
numeric_columns = ['Age', 'GS', 'MP', 'FG', 'FGA', '3P', '3PA', '2P', '2PA', 'FT', 'FTA', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS']
for col in numeric_columns:
if col in df.columns:
df[col] = pd.to_numeric(df[col], errors='coerce')
# Ensure 'Player' and 'Team' columns are strings
df['Player'] = df['Player'].astype(str)
df['Team'] = df['Team'].astype(str)
# Handle any remaining problematic columns
for col in df.columns:
if df[col].dtype == object:
df[col] = df[col].astype(str)
return df
# Usage
selected_season = st.sidebar.selectbox('Season', list(reversed(range(1997, 2025))))
player_data = fetch_player_data(selected_season)
This function fetches data from basketball-reference.comfor a given season and performs necessary data cleaning, processing it for our use.
Building the User Interface
Now, let's set up the basic structure of our Streamlit app and sidebar controls:
st.set_page_config(page_title="WNBA Analytics Dashboard", page_icon="π", layout="wide")
st.title("WNBA Player Analytics Dashboard")
# Sidebar for filters
st.sidebar.header('Filter Options')
st.sidebar.success("Filter players by season, team, and position to explore the data.")
selected_season = st.sidebar.selectbox('Season', list(reversed(range(1997, 2025))))
# Team and position selection
teams = sorted([team for team in player_data.Team.unique() if team != 'TOT'])
selected_teams = st.sidebar.multiselect('Team', teams, teams)
positions = ['C', 'F', 'G', 'F-G', 'C-F']
selected_positions = st.sidebar.multiselect('Position', positions, positions)
# Fetch and display data
player_data = fetch_player_data(selected_season)
st.write(player_data)
This creates a simple dashboard with a season selector and displays the raw data.
Step 4: Adding Data Visualization
Here, we have a helper function to clean percentage data, converting string percentages to floats and handling various data types. This dictionary maps user-friendly names to column names in the dataset.
We create a 2x2 grid layout for organizing dashboard components.
The Chart Section (col1):
- allows users to select a statistic and chart type (Pie or Bar).
- filters data based on a minimum value slider.
- creates and displays a chart of the top 5 players for the selected statistic. The stats Section (col2):
- displays quick stats like average, highest player, and number of players shown.
- shows a styled table of top players for the selected statistic.
Key Features include interactive elements (dropdowns, radio buttons, slider) for user customization and dynamic chart creation based on user selection.
AI Insights Section (col3) adds an AI-powered feature to the dashboard:
It creates a button that, when clicked, generates AI insights about the chart displayed above.
When the button is clicked, it checks if the necessary data is available in the session state.
If data is available, it creates a DataFrame of the top 5 players and their stats.
It then calls the generate_insights()
function (which you'd need to implement using a language model like GPT-3 or GPT-4) to analyze this data.
The generated insights are displayed to the user.
A warning is shown to remind users that the insights are AI-generated and should be verified.
This feature demonstrates how to integrate AI capabilities into a data dashboard, providing users with quick, automated analysis of the visualized data.
Implementing Player Comparison
Add a section for comparing two players:
st.markdown('<div class="player-comparison-section">', unsafe_allow_html=True)
st.subheader("π Player Comparison (players must have played in the same season)")
# Allow users to select players to compare
players = player_data['Player'].unique()
# Find the indices of Caitlin Clark and Angel Reese
caitlin_index = players.tolist().index('Caitlin Clark') if 'Caitlin Clark' in players else 0
angel_index = players.tolist().index('Angel Reese') if 'Angel Reese' in players else 0
player1 = st.selectbox("Select first player", players,index=caitlin_index, key='player1')
player2 = st.selectbox("Select second player", players, index=angel_index, key='player2')
def normalize(value, min_value, max_value):
try:
value = float(value)
return 100 * (value - min_value) / (max_value - min_value) if max_value > min_value else 50
except (ValueError, TypeError):
return 0 # or some default value for non-numeric entries
if player1 and player2:
# Get data for selected players
stats1 = player_data[player_data['Player'] == player1].iloc[0]
stats2 = player_data[player_data['Player'] == player2].iloc[0]
# Select stats to compare
stats_to_compare = ['PTS', 'AST', 'TRB', 'STL', 'BLK', 'FG%', '3P%', 'FT%']
# Convert columns to numeric, replacing non-numeric values with NaN
for stat in stats_to_compare:
player_data[stat] = pd.to_numeric(player_data[stat], errors='coerce')
normalized_stats = {}
for stat in stats_to_compare:
min_val = player_data[stat].min()
max_val = player_data[stat].max()
normalized_stats[stat] = [
normalize(stats1[stat], min_val, max_val),
normalize(stats2[stat], min_val, max_val)
]
# Create a radar chart
fig = go.Figure()
fig.add_trace(go.Scatterpolar(
r=[normalized_stats[stat][0] for stat in stats_to_compare],
theta=stats_to_compare,
fill='toself',
name=player1
))
fig.add_trace(go.Scatterpolar(
r=[normalized_stats[stat][1] for stat in stats_to_compare],
theta=stats_to_compare,
fill='toself',
name=player2
))
fig.update_layout(
polar=dict(radialaxis=dict(visible=True, range=[0, 100])),
showlegend=True,
legend=dict(
font=dict(size=16), # Increase font size
itemsizing='constant', # Make legend items a constant size
itemwidth=30, # Adjust item width
yanchor="top", # Anchor to the top
y=0.99, # Position at the top
xanchor="right", # Anchor to the right
x=0.99, # Position at the left
bgcolor="rgba(255, 255, 255, 0.5)", # Semi-transparent background
bordercolor="Black", # Border color
borderwidth=2, # Border width
),
title=dict(
text=f"{player1} vs {player2} Comparison",
font=dict(size=24) # Increase title font size
),
width=700, # Adjust as needed
height=700 # Adjust as needed
)
This creates a radar chart comparing two selected players across multiple statistics.
Adding an Interactive Map
Let's add a map showing WNBA team locations:
@st.cache_data
def create_wnba_map():
# WNBA teams, their locations, and home page URLs
wnba_teams = {
'Atlanta Dream': ('Atlanta, GA', 'https://dream.wnba.com/'),
'Chicago Sky': ('Chicago, IL', 'https://sky.wnba.com/'),
'Connecticut Sun': ('Uncasville, CT', 'https://sun.wnba.com/'),
'Dallas Wings': ('Arlington, TX', 'https://wings.wnba.com/'),
'Indiana Fever': ('Indianapolis, IN', 'https://fever.wnba.com/'),
'Las Vegas Aces': ('Las Vegas, NV', 'https://aces.wnba.com/'),
'Los Angeles Sparks': ('Los Angeles, CA', 'https://sparks.wnba.com/'),
'Minnesota Lynx': ('Minneapolis, MN', 'https://lynx.wnba.com/'),
'New York Liberty': ('Brooklyn, NY', 'https://liberty.wnba.com/'),
'Phoenix Mercury': ('Phoenix, AZ', 'https://mercury.wnba.com/'),
'Seattle Storm': ('Seattle, WA', 'https://storm.wnba.com/'),
'Washington Mystics': ('Washington, D.C.', 'https://mystics.wnba.com/')
}
# Create a map centered on the United States
m = folium.Map(location=[39.8283, -98.5795], zoom_start=4)
# Geocoding to get coordinates
geolocator = Nominatim(user_agent="wnba_app")
# Team name to abbreviation mapping
team_abbr = {
'Atlanta Dream': 'ATL', 'Chicago Sky': 'CHI', 'Connecticut Sun': 'CON',
'Dallas Wings': 'DAL', 'Indiana Fever': 'IND', 'Las Vegas Aces': 'LVA',
'Los Angeles Sparks': 'LAS', 'Minnesota Lynx': 'MIN', 'New York Liberty': 'NYL',
'Phoenix Mercury': 'PHO', 'Seattle Storm': 'SEA', 'Washington Mystics': 'WAS'
}
# Add markers for each team
for team, (city, url) in wnba_teams.items():
try:
location = geocode_with_retry(geolocator, city)
if location is None:
# Use fallback coordinates if geocoding fails
lat, lon = fallback_coordinates[team]
else:
lat, lon = location.latitude, location.longitude
# Get team abbreviation and color
abbr = team_abbr.get(team, 'ATL') # Default to ATL if not found
hex_color = team_colors.get(abbr, '#000000') # Default to black if color not found
rgb = webcolors.hex_to_rgb(hex_color)
closest_folium_color = closest_color(rgb)
# Create popup HTML with team info and link
popup_html = f"""
<b>{team}</b><br>
{city}<br>
<a href="{url}" target="_blank">Visit Team Website</a>
"""
folium.Marker(
[lat, lon],
popup=folium.Popup(popup_html, max_width=300),
tooltip=team,
icon=folium.Icon(color=closest_folium_color, icon='basketball', prefix='fa')
).add_to(m)
except Exception as e:
st.warning(f"Couldn't add marker for {team}: {str(e)}")
return m
This creates an interactive map with markers for each WNBA team.
Step 8: Adding a Chatbot Interface
Finally, let's add a chatbot for answering WNBA-related questions:
st.markdown('<div class="chatbot-section">', unsafe_allow_html=True)
st.subheader("π Chatπ¬ w/ WNBA AI Assistant powered by LangChain && Cloudflare Workers AIπ€")
# Add a loading message
chat_loading = st.empty()
chat_loading.info("Chat is initializing... This may take a few moments.")
# Initialize the LLM and conversation chain
@st.cache_resource
def initialize_chat(filtered_data: pd.DataFrame):
llm = CloudflareWorkersAI(
account_id=ACCOUNT_ID,
api_token=AUTH_TOKEN,
model="@cf/meta/llama-2-7b-chat-int8"
)
# Convert filtered_data to a string representation
data_context = filtered_data.to_string()
prompt = ChatPromptTemplate.from_messages([
("system", """You are a knowledgeable assistant specializing in WNBA statistics, players, and teams.
Provide accurate and helpful information about the WNBA.
Here's the current WNBA data you have access to:
{data_context}
Use this data to answer questions, but don't mention the data directly unless asked."""),
("human", "{input}"),
("ai", "{agent_scratchpad}")
])
memory = ConversationBufferMemory(return_messages=True, output_key="agent_scratchpad")
def get_chat_history(inputs):
return memory.chat_memory.messages
chain = (
RunnablePassthrough.assign(
agent_scratchpad=get_chat_history,
data_context=lambda _: data_context[:100] + "..." # Truncate for brevity
)
| prompt
| llm
)
return chain, memory, data_context
# Initialize the chat
chain, memory, data_context = initialize_chat(filtered_data)
# Remove the loading message
chat_loading.empty()
# Initialize chat history
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat messages from history on rerun
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# React to user input
if user_input := st.chat_input("Ask me anything about WNBA stats, players, or teams!"):
# Display user message in chat message container
st.chat_message("user").markdown(user_input)
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": user_input})
# Get AI response
with st.spinner("Thinking..."):
try:
response = chain.invoke({
"input": user_input,
"data_context": data_context
})
if not response or response.strip() == "":
response = "I apologize, but I couldn't generate a response. This could be due to an issue with the AI model or the input. Please try asking your question in a different way or try again later."
except Exception as e:
response = f"An error occurred: {str(e)}"
st.error(f"Debug: Error details: {e}")
# After getting the response from the model
if isinstance(response, list) and len(response) > 0 and hasattr(response[0], 'content'):
response_text = response[0].content
elif isinstance(response, dict) and 'content' in response:
response_text = response['content']
elif isinstance(response, str):
response_text = response
else:
response_text = str(response)
# Display assistant response in chat message container
with st.chat_message("assistant"):
st.markdown(response_text)
# Add assistant response to chat history
st.session_state.messages.append({"role": "assistant", "content": response_text})
# Update memory
memory.chat_memory.add_user_message(user_input)
memory.chat_memory.add_ai_message(response_text)
# Add some styling to make the chat interface look better
st.markdown("""
<style>
.stChatFloatingInputContainer {
bottom: 20px;
background-color: #f0f2f6;
padding: 10px;
border-radius: 10px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
}
</style>
""", unsafe_allow_html=True)
This creates a chatbot interface that can answer questions about WNBA statistics using the provided data.
Conclusion
We've now built a comprehensive WNBA analytics dashboard with data visualization, AI insights, and a chatbot interface. This project demonstrates the power of combining data analysis with machine learning to create interactive and informative tools for sports analytics.
The complete code can be found here on GitHub.
Happy coding, and enjoy exploring WNBA statistics with your new analytics dashboard!
Top comments (1)
Powerful...like the WNBA!