DEV Community

Cover image for ๐Ÿ€ Sports Analytics in R: From Raw Data to Insights
R Programming
R Programming

Posted on

๐Ÿ€ Sports Analytics in R: From Raw Data to Insights

Sports analytics has become a key tool for coaches, managers, and even fans who want to understand the game beyond the scoreboard. With R, we can transform raw sports data into actionable insights using statistical models, visualization, and machine learning.

In this post, weโ€™ll walk through an example of analyzing basketball player statistics using R. If youโ€™re looking for more resources to learn R, you can check out rprogrammingbooks.com
.

๐Ÿ“Š Dataset

For simplicity, weโ€™ll simulate a dataset of basketball players, including points, assists, rebounds, and minutes played.

Load libraries

library(dplyr)
library(ggplot2)

Simulated dataset

set.seed(123)
players <- data.frame(
player = paste("Player", 1:20),
points = round(rnorm(20, mean = 15, sd = 5)),
assists = round(rnorm(20, mean = 5, sd = 2)),
rebounds = round(rnorm(20, mean = 7, sd = 3)),
minutes = round(runif(20, min = 20, max = 40))
)

head(players)

โšก Efficiency Metrics

One of the key aspects in sports analytics is efficiency. Instead of just looking at total points, we can measure how productive a player is per minute on the court.

players <- players %>%
mutate(
points_per_min = points / minutes,
assists_per_min = assists / minutes,
rebounds_per_min = rebounds / minutes
)

head(players)

๐Ÿ“ˆ Visualizing Performance

Letโ€™s compare scoring efficiency (points_per_min) against playing time (minutes) to identify under- or over-performing players.

ggplot(players, aes(x = minutes, y = points_per_min, label = player)) +
geom_point(color = "blue", size = 3) +
geom_text(vjust = -0.8, size = 3) +
labs(
title = "Scoring Efficiency vs. Minutes Played",
x = "Minutes Played",
y = "Points per Minute"
) +
theme_minimal()

This scatterplot quickly shows which players score efficiently even with limited playing time.

๐Ÿ”ฎ Predicting Performance with Regression

We can also use a simple linear regression to see how well minutes explain total points scored.

model <- lm(points ~ minutes, data = players)
summary(model)

Plot regression line

ggplot(players, aes(x = minutes, y = points)) +
geom_point(color = "darkred") +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(
title = "Relationship Between Minutes and Points",
x = "Minutes Played",
y = "Total Points"
) +
theme_minimal()

The regression line highlights whether playing more minutes significantly contributes to scoring output.

๐Ÿ† Takeaways

Efficiency metrics often reveal hidden gems โ€” players who contribute a lot in limited time.

Visualization helps coaches and analysts identify outliers and patterns quickly.

Simple regression can uncover relationships between workload (minutes) and performance (points, assists, rebounds).

Sports analytics in R empowers us to go beyond raw numbers and uncover stories hidden in the data. Whether youโ€™re analyzing your favorite team, managing a fantasy league, or working in professional sports, R offers powerful tools to gain a competitive edge.

If youโ€™d like to dive deeper into learning R with books and tutorials, visit rprogrammingbooks.com
.

Top comments (0)