DEV Community

Dipti Moryani
Dipti Moryani

Posted on

Detecting Anomalies in R: A Practical Guide with Real-World Examples

Introduction

Imagine you run a credit card company and one of your customers spends $25 every week on groceries. Suddenly, they make a $700 purchase in a single transaction. This unusual behavior would immediately raise red flags, prompting you to verify the transaction before approving it. Such unexpected events are called anomalies—data points that deviate significantly from established patterns.

Anomaly detection plays a critical role in industries like banking, finance, and insurance (BFSI), where it is often used to detect fraud, prevent theft, and monitor suspicious transactions. Modern tools like Power BI and AI-driven models allow businesses to identify these irregularities in real time.

But not all anomalies are fraudulent. Sometimes, they reflect genuine life changes, such as starting a family or hosting a big celebration. The challenge lies in distinguishing fraudulent activities from authentic behavioral shifts—and that’s where anomaly detection models in R prove valuable.

Understanding Anomalies

In data science, anomalies are essentially outliers—values that differ drastically from typical patterns. They can occur due to fraud, system errors, or even legitimate events. For example:

Multiple small unauthorized withdrawals from a single credit card

A sudden large purchase that doesn’t fit the customer’s past behavior

Transactions from unfamiliar geographical locations

For BFSI companies, identifying such cases quickly can prevent losses and safeguard customer trust. AI consulting firms often deploy anomaly detection algorithms to help businesses uncover fraud, inefficiencies, or shifts in customer behavior.

Detecting Patterns with R

R offers several packages for anomaly detection, each based on different statistical techniques. In this guide, we’ll explore two widely used approaches:

Twitter’s AnomalyDetection Package – based on the Seasonal Hybrid ESD (SH-ESD) algorithm.

Anomalize Package – built for time series data analysis with decomposition methods.

These packages allow businesses to spot anomalies in time series data, where patterns like seasonality and trends can otherwise mask irregularities.

Example 1: Wikipedia Page Views with SH-ESD

We start with the AnomalyDetection package from GitHub, which identifies unusual spikes and dips in time series data. Let’s take page views from the Wikipedia page of FIFA (from March 2013 onward).

Install dependencies

install.packages("devtools")
install.packages("Rcpp")
library(devtools)
install_github("petermeissner/wikipediatrend")
install_github("twitter/AnomalyDetection")

Load libraries

library(Rcpp)
library(wikipediatrend)
library(AnomalyDetection)

Fetch FIFA page data

fifa_data_wikipedia <- wp_trend("fifa", from="2013-03-18", lang = "en")

Keep only relevant columns

fifa_data_wikipedia <- fifa_data_wikipedia[, c("date","views")]

Apply anomaly detection

anomalies <- AnomalyDetectionTs(fifa_data_wikipedia, direction="pos", plot=TRUE)
anomalies$plot

This method flags dates with unusually high page views. For instance, anomalies often align with FIFA matches, player news, or major announcements. If no event explains the spike, it may be a true anomaly worth investigating further.

Example 2: Bitcoin Price Fluctuations with Anomalize

The anomalize package is another powerful tool, especially for financial time series like cryptocurrency prices. It decomposes a time series into trend, seasonality, and remainder, and then highlights irregular patterns.

Install packages

install.packages("anomalize")
library(devtools)
install_github("business-science/anomalize")

Load supporting libraries

library(tidyverse)
library(coindeskr)

Get Bitcoin price data

bitcoin_data <- get_historic_price(start = "2017-01-01")
bitcoin_data_ts <- bitcoin_data %>% rownames_to_column() %>%
as_tibble() %>% mutate(date = as.Date(rowname)) %>%
select(-rowname)

Decompose and detect anomalies

bitcoin_data_ts %>%
time_decompose(Price, method = "stl") %>%
anomalize(remainder, method = "gesd") %>%
time_recompose() %>%
plot_anomalies(time_recomposed = TRUE)

This analysis reveals sudden jumps in Bitcoin’s price, such as its dramatic rise in 2018, and isolates anomalies as red points on the chart.

Different Approaches to Anomaly Detection in R

R provides multiple packages to handle anomalies, including:

Twitter’s AnomalyDetection – SH-ESD algorithm

Anomalize – decomposition-based detection

TsOutliers – focuses on time series outliers

Factor Analysis & PCA-based methods – useful for multivariate data

Each method has its strengths, but the underlying principle is the same: flagging data points that do not follow the expected pattern.

Conclusion

Anomaly detection is more than just identifying unusual numbers—it’s about understanding their context. In BFSI, anomalies could mean fraud, while in e-commerce they may indicate customer behavior shifts, and in healthcare they could highlight abnormal patient activity.

By leveraging R’s anomaly detection packages, businesses can not only spot irregularities but also take timely action to prevent risks, optimize processes, and improve decision-making.

This article was originally published on Perceptive Analytics.

In Atlanta, our mission is simple — to enable businesses to unlock value in data. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — helping them solve complex data analytics challenges. As a leading Tableau Experts in Atlanta, we turn raw data into strategic insights that drive better decisions.

Top comments (0)