Histograms are one of the most fundamental and useful tools in data analysis. They help us understand the distribution of data and provide visual insights into patterns, trends, and outliers.
Whether you are analyzing survey data, stock prices, or scientific measurements, histograms provide a quick and intuitive overview of how values are spread.
In this article, we will cover:
Basics of histograms
Plotting histograms in R
Customizing histograms with titles, colors, breaks, and axes
Probabilistic histograms and density options
Examples using real datasets (AirPassengers and iris)
Basics of Histograms
A histogram is a graphical representation of the distribution of a single variable. It divides the variable’s range into intervals (bins) and shows the frequency of values in each interval with vertical bars.
Histograms are useful because they can:
Show central tendency (median, mode)
Reveal gaps or outliers in data
Help identify skewness and clusters
Handle large amounts of data efficiently
Key point: Histograms work for numeric variables only. Non-numeric variables require bar plots or frequency tables.
Plotting Basic Histograms in R
R provides the built-in function hist() to create histograms. Let’s explore this with the AirPassengers dataset, a time series dataset of monthly international airline passengers (1949–1960).
Example 1: AirPassengers Data
Plot the AirPassengers time series
plot(AirPassengers)
Plot histogram of AirPassengers
hist(AirPassengers)
The plot of the time series shows trends and seasonality.
The histogram shows frequency of passenger counts. Most months had 100–200 passengers, reflecting the lower range of the time series.
Example 2: Iris Dataset
Next, we’ll use the iris dataset, a classic dataset with numeric features like Petal.Length and Petal.Width.
Inspect petal length distribution
plot(iris$Petal.Length)
Plot histogram of petal length
hist(iris$Petal.Length)
The histogram reveals three clusters in the petal length variable.
Gaps and overlaps in the histogram help understand variability and distribution.
Important: The species column is categorical. Attempting a histogram will result in an error:
hist(iris$Species) # Error: 'x' must be numeric
Use plot() instead for categorical variables:
plot(iris$Species)
Customizing Histograms
The hist() function is very flexible and allows customization of titles, axis labels, colors, breaks, density, and more.
Adding Titles and Axis Labels
hist(iris$Petal.Length,
main = "Histogram of Petal Length",
xlab = "Petal Length (cm)",
ylab = "Count")
Adding Colors and Borders
hist(iris$Petal.Length,
main = "Histogram of Petal Length",
xlab = "Petal Length (cm)",
ylab = "Count",
col = "blue",
border = "red")
Rotating Axis Labels
The las parameter controls the orientation of axis labels:
las = 0: parallel to axis (default)
las = 1: horizontal
las = 2: perpendicular to axis
las = 3: y-axis parallel, x-axis perpendicular
hist(iris$Petal.Length,
main = "Histogram of Petal Length",
xlab = "Petal Length (cm)",
ylab = "Count",
col = "blue",
border = "red",
las = 1)
Adjusting Axis Limits and Bin Size
xlim and ylim control axis limits
breaks controls the number or positions of bins
hist(iris$Petal.Length,
main = "Histogram of Petal Length",
xlab = "Petal Length (cm)",
ylab = "Count",
col = "blue",
border = "red",
las = 1,
xlim = c(1, 7),
ylim = c(0, 40),
breaks = 6)
You can also specify exact bin edges with a vector of breaks.
Probability (Density) Histogram
Set freq = FALSE or probability = TRUE to plot density instead of frequency:
hist(iris$Petal.Length,
main = "Petal Length Density Histogram",
xlab = "Petal Length (cm)",
ylab = "Density",
col = "blue",
border = "red",
freq = FALSE)
Adding Shading with Density and Angle
density sets the number of shading lines per inch
angle sets the angle of shading lines
hist(iris$Petal.Length,
main = "Histogram with Shading",
xlab = "Petal Length (cm)",
ylab = "Count",
col = "blue",
border = "red",
density = 50,
angle = 60)
Drawing Values on Top of Bars
hist(iris$Petal.Length,
main = "Histogram of Petal Length",
xlab = "Petal Length (cm)",
ylab = "Count",
col = "blue",
border = "red",
labels = TRUE,
xlim = c(1, 7),
ylim = c(0, 40),
las = 1)
This adds exact frequency values on top of each bar for better readability.
Getting Histogram Data Without Plot
Set plot = FALSE to return histogram data in the console:
hist_data <- hist(iris$Petal.Length, plot = FALSE)
str(hist_data)
Output includes:
$breaks: Bin edges
$counts: Frequency per bin
$density: Relative density
$mids: Midpoints of bins
Summary
Histograms are essential for univariate data exploration.
R’s hist() function provides flexible customization for titles, labels, colors, breaks, axes, and density.
Use plot() for categorical variables, as hist() only works for numeric data.
Histograms reveal clusters, outliers, gaps, and distribution patterns quickly.
With these techniques, you can visualize and analyze data efficiently in R, whether for basic exploratory analysis or for more advanced visualizations in tools like Tableau.
At Perceptive Analytics, our mission is “to enable businesses to unlock value in data.” For over 20 years, we’ve partnered with more than 100 clients—from Fortune 500 companies to mid-sized firms—to solve complex data analytics challenges. Our services include AI consulting companies and Power BI development services, turning data into strategic insight. We would love to talk to you. Do reach out to us.
Top comments (0)