DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

DoWhy Internals: Building a Causal Inference Engine from Scratch

Most "Causal" ML Is Just Correlation with Extra Steps

Here's a take that'll ruffle some feathers: 90% of production "causal inference" I've seen is regression with a fancier name. Teams slap on DoWhy, run estimate_effect(), and ship whatever number falls out—without understanding what the library actually does under the hood.

The result? Causal claims built on sand.

I'm not saying DoWhy is bad. It's genuinely excellent. But treating it as a black box defeats the entire purpose. The power of causal inference comes from making your assumptions explicit—and you can't do that if you don't understand what assumptions DoWhy is making for you.

So let's build a minimal causal inference engine from scratch, then reverse-engineer DoWhy's internals to see how the real thing works.

Algebra equations with symbols on a chalkboard in a brightly lit classroom.

Photo by Bernice Chan on Pexels

The Four-Step Pipeline That DoWhy Actually Runs

DoWhy follows a workflow that looks deceptively simple:

Model → Identify → Estimate → Refute
Enter fullscreen mode Exit fullscreen mode

But each arrow hides substantial complexity. Here's what's actually happening.


Continue reading the full article on TildAlice

Top comments (0)