Debiasing Graph Neural Networks for Recommendation with Causal RL

Tasfin Mahmud — Sat, 23 May 2026 05:24:37 +0000

As part of my undergraduate research in Graph Neural Networks (GNNs) and Causal Inference, I've been exploring a major flaw in modern recommender systems: observational bias.

Standard recommendation algorithms—even state-of-the-art GNNs like LightGCN and NGCF—learn from biased data. Popular items get shown more often, which leads to more clicks, creating a feedback loop that reinforces popularity bias and buries niche items.

To solve this, I built an open-source framework that combines GNNs with Causal Reinforcement Learning to debias recommendations.

Here is how I approached it.

👋 Hi, I'm Tasfin Mahmud! I'm a CS Researcher at BRAC University and an open-source contributor. You can learn more about my work on my portfolio website or my GitHub.

🏗️ The Baseline GNNs

I started by implementing three solid baseline architectures in PyTorch Geometric (PyG):

LightGCN: The minimalist approach that drops non-linear transformations.
NGCF (Neural Graph Collaborative Filtering): Explicitly models high-order connectivities with feature interaction terms.
GAT-CF: Graph Attention Networks adapted for collaborative filtering.

These models perform incredibly well on standard metrics. But there is a catch: if you evaluate them on observational data, the metrics look great only because the test data shares the same exposure bias as the training data.

🧪 Injecting Causal RL

To break the popularity loop, I implemented four complementary causal debiasing techniques.

1. Inverse Propensity Scoring (IPS)

The simplest way to fix exposure bias is to reweight the Bayesian Personalised Ranking (BPR) training loss. IPS divides the loss for each item by its exposure probability. Rarely shown items receive a higher gradient signal, while mega-popular items are scaled down.

2. Causal Embeddings (CausE)

Here, the model maintains two separate embedding spaces:

A factual space (learned from the biased data)
A counterfactual space (representing uniform exposure)

A discrepancy regularizer pulls the factual representations toward the unbiased counterfactual ones, preventing the model from overfitting to the exposure distribution.

3. Causal Policy Gradient

Treating recommendations as a sequential decision-making problem, I used the REINFORCE algorithm. The core innovation here is Causal Reward Shaping: decomposing observed rewards into the "true preference" (causal component) and the "popularity bias" (confounding component). Using Doubly Robust (DR) estimation makes learning from logged data much more stable.

4. Causal Discovery

How do we know what the confounders are if they aren't explicitly measured? I implemented a causal discovery module using Truncated SVD on the exposure matrix to automatically identify latent confounding factors, which are then integrated into the reward shaping process.

📊 The Results

I benchmarked these approaches using LightGCN on the MovieLens 100k dataset:

Model Mode	Recall@20	NDCG@20	Notes
Standard (Baseline)	0.1676	0.1624	Standard biased observational learning
IPS Debiasing	0.1453	0.1543	Re-weights rare items; expected to drop on biased test data
CausE	0.1675	0.1625	Regularized against uniform exposure
Causal PG (DR)	0.1593	0.1602	Doubly robust policy gradient

(Note: Evaluating debiased models on standard biased test sets results in lower raw metric scores because the test set shares the exposure bias. Unbiased logging data is required to see the true lift).

💡 The Takeaway

GNNs are powerful tools for recommendation, but without causal inference, they are simply learning to amplify existing biases in your dataset. By utilizing techniques like IPS and Causal Policy Gradients, we can build recommendation systems that truly understand user preference rather than just popularity.

🔗 Check out the full framework on my GitHub:
gnn-collaborative-filtering

🔗 Learn more about my research and open-source work:
tasfinmahmud.github.io

Let me know in the comments if you've worked with Causal Inference for recommendation systems!

DEV Community: Tasfin Mahmud