DEV Community

Manas Sharma
Manas Sharma

Posted on • Originally published at github.com

Causal AI vs. Correlation in Telecom Fraud Detection: Why It Matters

Telecom fraud costs the industry $3.8B/year. Most systems predict it — but can't explain what causes it. Here's why causality changes everything.

The Problem with Correlation

If you've ever built a fraud detection model, you've seen SHAP plots showing that Call Data Record (CDR) volume "correlates" with SIM box fraud. So what? Do you throttle CDR intake? Restrict call volumes? Neither makes intuitive sense — but correlation-based ML will flag it as the most important feature.

This is Simpson's Paradox in production: a model tells you the right association but the wrong intervention. In telecom, acting on the wrong lever doesn't just waste budget — it actively worsens revenue leakage.

Enter Causal AI

Causal inference flips the question from "what predicts fraud?" to "what causes fraud, and how much does fixing it help?"

Our system — the Causal Decision Intelligence Engine (CDIE) — combines two approaches:

1. Causal Structure Discovery (GFCI)

Instead of assuming relationships, we learn them from data using the Greedy Fast Causal Inference algorithm, which handles hidden confounders that standard ML ignores. We then cross-validate with PCMCI+ for temporal causality — because fraud patterns evolve over time, not instantaneously.

2. Doubly-Robust Effect Estimation (LinearDML)

Once we know the structure, we measure how much each intervention moves the needle. Using Double Machine Learning (EconML), we get ATE estimates with 95% confidence intervals, not just binary scores. The kicker? It's "doubly robust" — it stays consistent even if one of your nuisance models is misspecified.

Validation That Matters Before Trust

We don't just estimate effects — we test whether they're real. Every causal edge runs through a 3-test refutation suite (DoWhy):

  • Placebo test: Replace the treatment with a random variable — effect should vanish.
  • Random confounder test: Add a fake common cause — estimate should be stable.
  • Data subset test: Re-estimate on random subsets — results should be consistent.

Edges that fail get quarantined. Only validated edges make it into the Safety Map.

The Architecture

┌─────────────────────────────────────────────┐
│  OPEA GenAI Layer (Intel-Optimized)          │
│  • LLM TextGen (neural-chat-7b)              │
│  • TEI Embedding (bge-base-en-v1.5)          │
│  • TEI Reranking (bge-reranker-base)         │
└──────────────────┬──────────────────────────┘
                   │
┌──────────────────▼──────────────────────────┐
│  CDIE Causal Engine                         │
│                                              │
│  OFFLINE          GFCI → DoWhy → dML → MAPIE│
│  ONLINE           Safety Map + RAG + LLM     │
│                                              │
│  RESULT           "Tighten fraud policy →    │
│                   18% ↓ leakage [14, 22%] CI"│
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Why OPEA?

We built on Intel's OPEA (Open Platform for Enterprise AI) because:

  • Composable: Each GenAI component is a standalone microservice — we can swap LLMs, embedding models, or rerankers without rebuilding the pipeline.
  • Intel-optimized: AMX/AVX-512 acceleration gives us ~18x throughput vs. baseline on CPU-only infrastructure. No GPU required.
  • Enterprise-ready: Built for scale from line one.

We've also proposed a Causal Inference GenAIComp to the OPEA project (GitHub Issue #2063) — because causal reasoning deserves to be a first-class component in any enterprise AI stack.

What We Built for ITU AI4Good

CDIE v5 is our submission to the ITU AI4Good OPEA Innovation Challenge. It demonstrates:

  • Causal discovery from telecom observational data (GFCI + PCMCI+)
  • Refutation-validated effect estimates (not just point estimates)
  • Conformal prediction intervals (MAPIE) for distribution-free uncertainty
  • OPEA integration: RAG pipeline generating executive-ready causal intelligence reports
  • Human-in-the-loop: Expert corrections, knowledge adjudication, federated causal learning

Results So Far

Benchmarked live on a 12-node synthetic telecom SCM (Windows 11, Python 3.13, Intel Alder Lake):

Causal Discovery (own SCM ground truth)

Metric Value
Precision 93.8%
Recall 88.2%
F1 Score 90.9%
SHD 2 (15 TP, 1 FP, 2 FN)

Refutation Validation (DoWhy 3-test suite)

  • Pass rate: 93.8% (15 validated, 1 quarantined)

Online Query Performance (Safety Map Lookup)

Metric Value
Mean Latency 0.49 ms
Max 0.64 ms
Memory (RSS) 96.0 MB
Pre-computed Scenarios 640

Training Data & Artifacts

  • 2,000 rows × 14 variables (424 KB CSV)
  • safety_map.db: 776 KB (SQLite)
  • safety_map.json: 806 KB

Hardware: Intel64 Family 6 Model 198, 12 threads, OMP_NUM_THREADS=12. AMX not active in this run (DNNL_MAX_CPU_ISA not set).

The next step: real-world validation against operator data.

Code & Resources


This is built for the ITU AI4Good OPEA Innovation Challenge by Team CDIE. Questions? Comments? Drop them below — I'll answer every one.

Top comments (0)