Dipti M

Posted on Jan 6

Propensity Score Matching (PSM): A Practical Guide with R

#webdev #ai #programming #javascript

In many real-world business and policy problems, random assignment is simply not possible. Customers choose whether to respond to a campaign, patients choose treatments, and employees opt into training programs. This makes it difficult to estimate true causal effects using traditional experimental methods.
Propensity Score Matching (PSM) is a widely used statistical technique that helps address this challenge in observational studies.

What Are Propensity Scores?
Propensity scores were introduced by Rosenbaum and Rubin (1983) in their seminal paper “The Central Role of the Propensity Score in Observational Studies for Causal Effects.”
Formally, a propensity score is the probability that a unit (person, customer, patient) receives a treatment, given a set of observed characteristics (covariates).
In practice:
We estimate this probability using a model (typically logistic regression)
We then match treated and untreated individuals with similar propensity scores
Unmatched units are discarded
This process creates treatment and control groups that are comparable, approximating a randomized experiment.

Propensity Score Matching in Simple Terms
PSM is used in observational studies to reduce selection bias.
Why Not Just Compare Groups Directly?
In a controlled experiment—say, testing a drug on lab rats—randomization ensures both groups are identical except for the treatment. Any difference in outcomes can confidently be attributed to the drug.
With humans (or customers), this is rarely possible:
People differ in age, income, behavior, motivation, and context
Those who receive a treatment may already be different from those who do not
PSM helps by matching individuals who look similar on observable characteristics, differing mainly in whether they received the treatment.

Business Example: Measuring Campaign Effectiveness
Suppose a marketer wants to measure the impact of an advertising campaign on purchases.
Some customers respond to the campaign
Others do not
Purchases may be influenced by age, income, or other factors—not just the campaign
Simply comparing buyers vs. non-buyers risks attributing purchases to the campaign when they may be driven by pre-existing differences.
PSM allows us to compare “like with like” customers, isolating the campaign’s effect more credibly.

Implementing Propensity Score Matching in R
We’ll walk through a simple example using simulated campaign data and two R packages:
MatchIt – for matching
tableone – for balance diagnostics

The Dataset
Data <- read.csv("Campaign_Data.csv", header = TRUE)
dim(Data)

[1] 1000 4

The dataset contains:
Age
Income
Ad_Campaign_Response (1 = Responded, 0 = Did not respond)
Bought (1 = Purchased, 0 = Did not purchase)
head(Data)

Treatment and Control Groups
Treats <- subset(Data, Ad_Campaign_Response == 1)
Control <- subset(Data, Ad_Campaign_Response == 0)

colMeans(Treats)
colMeans(Control)

dim(Treats)
dim(Control)

We have:
404 treated individuals
596 control individuals
Already, the imbalance in group sizes suggests caution when estimating treatment effects directly.

Naïve Outcome Model (Before Matching)
model_1 <- lm(Bought ~ Ad_Campaign_Response + Age + Income, data = Data)
model_1$coefficients["Ad_Campaign_Response"]

This regression suggests the campaign increases purchase probability by ~73%.
However, this estimate may be biased due to selection effects.

Estimating Propensity Scores
We now model the probability of responding to the campaign:
pscores.model <- glm(
Ad_Campaign_Response ~ Age + Income,
family = binomial("logit"),
data = Data
)

Data$PScores <- pscores.model$fitted.values

This step answers:
“Given age and income, how likely is this person to respond to the campaign?”

Checking Balance Before Matching
library(tableone)
xvars <- c("Age", "Income")

table1 <- CreateTableOne(
vars = xvars,
strata = "Ad_Campaign_Response",
data = Data,
test = FALSE
)

print(table1, smd = TRUE)

Why SMD Matters
Standardized Mean Difference (SMD) measures covariate imbalance
Values above 0.1 indicate meaningful imbalance
In this simulated dataset, imbalance is low—but we proceed to demonstrate matching.

Matching Algorithms
Matching creates comparable treatment–control pairs based on propensity scores.

Exact Matching library(MatchIt)

match1 <- matchit(
Ad_Campaign_Response ~ Age + Income,
method = "exact",
data = Data
)

match1.data <- match.data(match1)

Exact matching is very strict and often discards large portions of data, especially with continuous variables.

Nearest Neighbor Matching match2 <- matchit( Ad_Campaign_Response ~ Age + Income, method = "nearest", ratio = 1, data = Data )

match2.data <- match.data(match2)

Nearest neighbor matching pairs each treated individual with the closest control in terms of propensity score.

Balance After Matching
table_match2 <- CreateTableOne(
vars = xvars,
strata = "Ad_Campaign_Response",
data = match2.data,
test = FALSE
)

print(table_match2, smd = TRUE)

Key results:
404 treated matched with 404 controls
SMD values near zero
Excellent balance achieved
This indicates that the matched dataset closely approximates a randomized experiment.

Outcome Analysis: Testing the Campaign Effect
We now test whether responding to the campaign increases purchase probability.
y_trt <- match2.data$Bought[match2.data$Ad_Campaign_Response == 1]
y_con <- match2.data$Bought[match2.data$Ad_Campaign_Response == 0]

difference <- y_trt - y_con
t.test(difference)

Interpretation
p-value < 2.2e-16 → highly significant
Estimated treatment effect ≈ 0.73
Conclusion:
After controlling for selection bias, responding to the campaign increases the probability of purchase by approximately 73 percentage points.

Why PSM Matters in Practice
Propensity Score Matching is widely used in:
Marketing attribution and uplift modeling
Healthcare outcomes research
Policy evaluation
AI strategy consulting and experimentation design
It enables credible causal inference when randomized experiments are infeasible—making it a powerful tool for real-world decision-making.

References
Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in Observational Studies
Statisticshowto – Propensity Score Matching
Jason Roy – Inferring Causal Effects from Observational Data
RStudio Pubs – PSM Examples
At Perceptive Analytics, our mission is “to enable businesses to unlock value in data.” For over 20 years, we’ve partnered with more than 100 clients—from Fortune 500 companies to mid-sized firms—to solve complex data analytics challenges. Our services include working with experienced Power BI freelancers and operating as a trusted marketing analytics company, turning data into strategic insight. We would love to talk to you. Do reach out to us.

DEV Community

Propensity Score Matching (PSM): A Practical Guide with R

Top comments (0)