When researchers or analysts want to measure the effect of a treatment, the gold standard is to run a randomized controlled trial (RCT). But in many real-world scenarios, randomization is not possible. People differ in ways that can bias results—age, income, lifestyle, or preferences all influence outcomes. This is where Propensity Score Matching (PSM) comes in.
First introduced by Rosenbaum and Rubin (1983) in The Central Role of the Propensity Score in Observational Studies for Causal Effects, PSM provides a statistical method to mimic the benefits of randomization in observational data. In simpler terms, it helps us compare apples to apples by matching treated and untreated units with similar characteristics.
In this article, we’ll cover:
What Propensity Score Matching is (in both statistical and plain English terms)
Why it matters for causal inference
How to implement PSM in R step by step using MatchIt and TableOne
A marketing campaign example to make it practical
Key challenges and takeaways
Why Random Assignment Isn’t Always Possible
Imagine testing the effect of a new drug. In a controlled lab, researchers can assign rats randomly into treatment and control groups. Everything else—genetics, environment, diet—is kept constant. Any difference in health outcomes can be attributed to the drug.
But with people, it’s not that simple. We can’t randomize age, income, education, or health conditions. Participants who take a treatment often differ systematically from those who don’t. If we just compare averages, we risk confusing correlation with causation.
Propensity Score Matching offers a solution: it creates balanced groups that are statistically similar, letting us estimate causal effects more reliably.
What is Propensity Score Matching (PSM) in Simple Terms?
At its core, propensity score matching answers this question: What would the outcome have been for the treated group if they hadn’t received treatment?
Here’s a plain-language breakdown:
Step 1: Estimate the probability (propensity score) that each individual receives treatment, based on observed characteristics.
Step 2: Match treated individuals with untreated individuals who have similar propensity scores.
Step 3: Compare outcomes between matched groups to estimate the treatment effect.
By matching on propensity scores instead of raw characteristics, we simplify multidimensional comparisons into a single balancing score.
Real-World Example: Marketing Campaign
Suppose a company runs a new ad campaign and wants to know whether it increased purchases.
Treatment group: Customers exposed to the campaign
Control group: Customers not exposed
If we just compare purchase rates, we risk bias. Maybe higher-income customers were more likely to see the campaign. Their purchasing power, not the campaign, could explain the difference.
By applying PSM, we can match campaign responders with similar non-responders (same age, same income) and isolate the campaign’s true effect.
Implementing Propensity Score Matching in R
We’ll use a simulated dataset of 1,000 customers with age, income, campaign response, and purchase behavior.
Reading the dataset
Data <- read.csv("Campaign_Data.csv", header = TRUE)
dim(Data)
The dataset contains:
Age
Income
Ad_Campaign_Response (1 = responded, 0 = not responded)
Bought (1 = purchased, 0 = did not purchase)
Previewing the first few records:
head(Data)
Step 1: Treatment vs Control Groups
Let’s split the data.
Treats <- subset(Data, Ad_Campaign_Response == 1)
Control <- subset(Data, Ad_Campaign_Response == 0)
We now have:
404 treated records
596 control records
At first glance, these groups look different, which means raw comparisons may mislead us.
Step 2: Estimating Propensity Scores
We use a logistic regression model to estimate the probability of campaign response based on age and income.
pscores.model <- glm(Ad_Campaign_Response ~ Age + Income,
family = binomial("logit"), data = Data)
summary(pscores.model)
Data$PScores <- pscores.model$fitted.values
These fitted values are the propensity scores. A histogram of scores shows how similar (or different) the treatment and control groups are.
hist(Data$PScores[Data$Ad_Campaign_Response == 1],
main = "PScores of Responders")
hist(Data$PScores[Data$Ad_Campaign_Response == 0],
main = "PScores of Non-Responders")
Step 3: Pre-Matching Balance Check
We use the tableone package to check covariate balance before matching.
library(tableone)
xvars <- c("Age", "Income")
table1 <- CreateTableOne(vars = xvars,
strata = "Ad_Campaign_Response",
data = Data, test = FALSE)
print(table1, smd = TRUE)
The Standardized Mean Difference (SMD) helps us measure imbalance. Ideally, SMD values should be below 0.1. If not, matching is required.
Step 4: Matching Algorithms
The MatchIt package provides multiple algorithms.
4.1 Exact Matching
library(MatchIt)
match1 <- matchit(pscores.model, method="exact", data=Data)
summary(match1, covariates = TRUE)
This method pairs individuals with identical covariates. In practice, it’s often too strict and discards many observations.
4.2 Nearest Neighbor Matching
A more flexible method is nearest neighbor matching, which matches each treated unit with the closest control unit in terms of propensity score.
match2 <- matchit(pscores.model, method="nearest", ratio=1, data=Data)
match2.data <- match.data(match2)
Plots can visualize the quality of matching:
plot(match2, type="jitter")
plot(match2, type="hist")
Step 5: Post-Matching Balance Check
After matching, we check balance again.
table_match2 <- CreateTableOne(vars = xvars,
strata = "Ad_Campaign_Response",
data = match2.data, test = FALSE)
print(table_match2, smd = TRUE)
This time, SMD values drop close to zero, confirming that matching succeeded.
Step 6: Outcome Analysis
Finally, we test whether the campaign influenced purchases.
y_trt <- match2.data$Bought[match2.data$Ad_Campaign_Response == 1]
y_con <- match2.data$Bought[match2.data$Ad_Campaign_Response == 0]
difference <- y_trt - y_con
t.test(difference)
The paired t-test shows a p-value < 0.001, meaning the difference is highly significant. On average, the campaign increased purchase probability by 73%.
Business Applications
Propensity Score Matching isn’t limited to marketing. It’s widely used in:
Healthcare: Comparing patients receiving different treatments
Policy Analysis: Evaluating the impact of government programs
Finance: Measuring effects of risk interventions
HR Analytics: Understanding training program effectiveness
In each case, PSM helps approximate causal effects in observational studies, reducing bias from confounding variables.
Challenges and Best Practices
While PSM is powerful, it comes with caveats:
Unobserved Confounders: Matching only accounts for observed variables. Hidden biases remain a risk.
Discarded Data: Some observations may be unmatched and removed, reducing sample size.
Choice of Algorithm: Results can vary depending on whether you use exact, nearest neighbor, or caliper matching.
Best practices include:
Always check covariate balance before and after matching
Use multiple matching methods and compare results
Interpret results cautiously, acknowledging limitations
Conclusion
Propensity Score Matching in R bridges the gap between randomized trials and observational data. By balancing treatment and control groups on observed characteristics, PSM enables more credible causal inference.
In our marketing campaign example, we found that matching revealed a 73% higher likelihood of purchase among campaign responders compared to similar non-responders. This insight can guide future campaign strategies and budget allocation.
As businesses increasingly rely on data-driven decisions, techniques like PSM will become essential for drawing valid conclusions from real-world, non-randomized data.
Scale operations with Power BI consultant, a certified Tableau consultant, and pipeline-ready Talend Consultant
.
Top comments (0)