Sports analytics has become a key tool for coaches, managers, and even fans who want to understand the game beyond the scoreboard. With R, we can transform raw sports data into actionable insights using statistical models, visualization, and machine learning.
In this post, weโll walk through an example of analyzing basketball player statistics using R. If youโre looking for more resources to learn R, you can check out rprogrammingbooks.com
.
๐ Dataset
For simplicity, weโll simulate a dataset of basketball players, including points, assists, rebounds, and minutes played.
Load libraries
library(dplyr)
library(ggplot2)
Simulated dataset
set.seed(123)
players <- data.frame(
player = paste("Player", 1:20),
points = round(rnorm(20, mean = 15, sd = 5)),
assists = round(rnorm(20, mean = 5, sd = 2)),
rebounds = round(rnorm(20, mean = 7, sd = 3)),
minutes = round(runif(20, min = 20, max = 40))
)
head(players)
โก Efficiency Metrics
One of the key aspects in sports analytics is efficiency. Instead of just looking at total points, we can measure how productive a player is per minute on the court.
players <- players %>%
mutate(
points_per_min = points / minutes,
assists_per_min = assists / minutes,
rebounds_per_min = rebounds / minutes
)
head(players)
๐ Visualizing Performance
Letโs compare scoring efficiency (points_per_min) against playing time (minutes) to identify under- or over-performing players.
ggplot(players, aes(x = minutes, y = points_per_min, label = player)) +
geom_point(color = "blue", size = 3) +
geom_text(vjust = -0.8, size = 3) +
labs(
title = "Scoring Efficiency vs. Minutes Played",
x = "Minutes Played",
y = "Points per Minute"
) +
theme_minimal()
This scatterplot quickly shows which players score efficiently even with limited playing time.
๐ฎ Predicting Performance with Regression
We can also use a simple linear regression to see how well minutes explain total points scored.
model <- lm(points ~ minutes, data = players)
summary(model)
Plot regression line
ggplot(players, aes(x = minutes, y = points)) +
geom_point(color = "darkred") +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(
title = "Relationship Between Minutes and Points",
x = "Minutes Played",
y = "Total Points"
) +
theme_minimal()
The regression line highlights whether playing more minutes significantly contributes to scoring output.
๐ Takeaways
Efficiency metrics often reveal hidden gems โ players who contribute a lot in limited time.
Visualization helps coaches and analysts identify outliers and patterns quickly.
Simple regression can uncover relationships between workload (minutes) and performance (points, assists, rebounds).
Sports analytics in R empowers us to go beyond raw numbers and uncover stories hidden in the data. Whether youโre analyzing your favorite team, managing a fantasy league, or working in professional sports, R offers powerful tools to gain a competitive edge.
If youโd like to dive deeper into learning R with books and tutorials, visit rprogrammingbooks.com
.
Top comments (0)