DEV Community

Cover image for I Built CausalLens — A Free, Open-Source Causal Impact Calculator for Time Series (5 Methods, Zero Setup)
BrainWire
BrainWire

Posted on

I Built CausalLens — A Free, Open-Source Causal Impact Calculator for Time Series (5 Methods, Zero Setup)

I want to show you a tool I just open-sourced. It's called CausalLens, and it answers one specific question that most analytics stacks get completely wrong: did this intervention actually cause the change in my metric?

The problem with standard before/after analysis
Before/after comparisons are everywhere. They're also almost always misleading.

When you compare a metric before and after an intervention, you're implicitly assuming that the only thing that changed was your intervention. In practice, seasonality changes, external trends shift, unrelated events happen. The "improvement" you're seeing might have occurred anyway.

The right answer is to build a counterfactual: a statistical estimate of what would have happened if you had never intervened. The gap between that counterfactual and your observed data is your causal estimate.
What CausalLens does

You provide a CSV with a time series and an intervention date. The app fits a pre-intervention model, projects it forward as the counterfactual, and reports:

Estimated effect size (absolute and percentage)
p-value for statistical significance
95% confidence interval
Plain-English interpretation
Downloadable PDF and interactive HTML reports

The 5 methods and when to use each

ARIMA ITS (Interrupted Time Series)
Best for: single series, no obvious seasonality, straightforward before/after structure. The ITS framework is well-validated in public health and economics literature for exactly this use case.

SARIMAX
Best for: data with strong seasonal patterns (weekly cycles, monthly cycles, etc.). Ignoring seasonality inflates or deflates your effect estimate badly, so this matters more than people expect.

Bayesian Structural Time Series
Best for: when you want probabilistic output and explicit uncertainty quantification rather than a point estimate. The Bayesian approach also handles structural changes in the pre-period more gracefully.

Difference-in-Differences
Best for: when you have a natural control group that didn't receive the intervention. Classic econometrics approach, still one of the most credible methods when the parallel trends assumption holds.

Synthetic Control
Best for: when you have multiple potential control units but no single clean control group. The method finds the optimal weighted combination of control units to build your counterfactual. Computationally the most expensive method here, and the trickiest to implement correctly on messy data.

Technical stack and deployment constraints
Everything runs on Streamlit. The whole app is designed to fit within Streamlit Community Cloud's free tier: CPU-only, 1GB RAM, no external services.

The main packages:

statsmodels for ARIMA, SARIMAX
pymc for Bayesian STS
scipy.optimize for the Synthetic Control weight solver
reportlab for PDF generation
plotly for the interactive HTML reports

One non-obvious decision: I avoided causalimpact (the Python port of the R package) because it has dependency issues on resource-constrained environments. Building the Bayesian STS from scratch with PyMC gave me more control and better stability.

The hardest part: Synthetic Control on real data
The Synthetic Control weight optimization is a quadratic program subject to simplex constraints. In theory, clean. In practice, donor pool data is often collinear, the objective surface is flat in places, and solvers behave inconsistently.

I ended up wrapping the optimizer with multiple fallback strategies and added explicit diagnostics (pre-period fit quality, effective number of donors) so users can see when the method is straining.
What I'd build next

Regression Discontinuity Design is the obvious missing method. It handles the case where treatment assignment was determined by a threshold (e.g., everyone above a score threshold got the intervention). If you want to contribute that, the repo is ready for it.

Longer term, I want to add automated method selection based on data characteristics, and better guidance for users who aren't sure which method fits their situation.
Try it

Live app: https://causallens-khg4uatpmnhustajhn8mdl.streamlit.app/
GitHub: https://github.com/AshayK003/CausalLens

Feedback, issues, and PRs all welcome. The goal is to make rigorous causal analysis accessible to people who need it but don't have time to become econometricians.

Top comments (0)