Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

A/B Testing Statistical Framework

#datascience #sql #python #analytics

A/B Testing Statistical Framework

Complete A/B testing toolkit with sample size calculators, frequentist and Bayesian significance tests, sequential testing support, and automated results reporting. Built for analysts who need statistically rigorous experiment design without heavyweight platforms.

Key Features

Sample Size Calculator — compute required sample per variant given baseline rate, MDE, power, and significance level
Frequentist Significance Tests — z-test and chi-squared tests for proportions and means with confidence intervals
Bayesian A/B Analysis — Beta-Binomial posterior with credible intervals and probability-to-beat-control
Sequential Testing — alpha-spending functions (O'Brien-Fleming, Pocock) for early stopping
Segmented Analysis — break results by device, geo, or any custom dimension
Power Analysis Charts — visualize trade-offs between sample size, MDE, and power
Results Report Generator — export formatted summaries to Markdown or HTML
Pre-Deployment Checklist — validate experiment setup before launch

Quick Start

from src.calculator import sample_size_calculator
from src.significance import run_ztest

# 1. Calculate required sample size
n = sample_size_calculator(
    baseline_rate=0.12,
    minimum_detectable_effect=0.02,  # absolute lift
    power=0.80,
    significance_level=0.05,
)
print(f"Required sample per variant: {n:,}")  # ~3,623

# 2. After collecting data, test significance
result = run_ztest(
    control_visitors=4000, control_conversions=480,
    variant_visitors=4000, variant_conversions=552,
)
print(f"p-value: {result.p_value:.4f}")
print(f"Lift: {result.relative_lift:.1%}")
print(f"Significant: {result.is_significant}")

Usage Examples

Bayesian Analysis

from src.bayesian import BayesianABTest

test = BayesianABTest(prior_alpha=1, prior_beta=1)
test.add_control(visitors=5000, conversions=600)
test.add_variant(visitors=5000, conversions=672)

summary = test.summarize()
print(f"P(variant > control): {summary.prob_variant_wins:.1%}")
print(f"Expected lift: {summary.expected_lift:.2%}")
print(f"95% credible interval: [{summary.ci_low:.2%}, {summary.ci_high:.2%}]")

Sequential Testing with Early Stopping

from src.sequential import SequentialTest, SpendingFunction

seq = SequentialTest(
    max_looks=5,
    overall_alpha=0.05,
    spending_function=SpendingFunction.OBRIEN_FLEMING,
)

# At each interim analysis (e.g., 20%, 40%, 60%, 80%, 100% of data):
for look, (ctrl, var) in enumerate(interim_results, 1):
    decision = seq.analyze(look, ctrl, var)
    if decision.stop_early:
        print(f"Stop at look {look}: {decision.conclusion}")
        break

Segment-Level Breakdown

from src.segments import segmented_analysis

results = segmented_analysis(
    data=experiment_df,
    variant_col="variant",
    metric_col="converted",
    segment_col="device_type",
)
for seg in results:
    print(f"{seg.name}: lift={seg.lift:.2%}, p={seg.p_value:.4f}")

Configuration

Edit config.example.yaml to set organization defaults:

defaults:
  significance_level: 0.05       # Two-tailed alpha
  power: 0.80                    # 1 - beta
  minimum_detectable_effect: 0.02
  test_type: "two-sided"         # or "one-sided"

bayesian:
  prior_alpha: 1                 # Beta prior parameter
  prior_beta: 1                  # Uninformative prior
  simulations: 100000            # Monte Carlo draws

sequential:
  max_looks: 5
  spending_function: "obrien_fleming"

reporting:
  output_format: "markdown"      # "markdown" or "html"
  include_charts: true

Best Practices

Set MDE before the test starts — never peek at results and adjust your threshold
Run tests to full sample size unless using sequential testing with alpha correction
Use Bonferroni or Holm correction when testing multiple variants or metrics
Log-transform revenue metrics — they are rarely normally distributed
Check sample ratio mismatch (SRM) — if observed split deviates from expected, the experiment is compromised
Document every experiment — use the results report generator for consistent records

Troubleshooting

Issue	Cause	Fix
Sample size unreasonably large	MDE is too small relative to baseline	Increase MDE or accept lower power
p-value exactly 0.0	Integer overflow in large samples	Use log-space computation in `significance.py`
Bayesian and frequentist disagree	Different priors or assumptions	Align prior with historical data; check test type matches
SRM detected	Traffic allocation bug or bot filtering	Investigate logging and assignment logic before trusting results

Requirements

Python 3.10+
Standard library only (math, statistics, collections)

This is 1 of 11 resources in the Data Analyst Toolkit toolkit. Get the complete [A/B Testing Statistical Framework] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire Data Analyst Toolkit bundle (11 products) for $129 — save 30%.

Get the Complete Bundle →

DEV Community

A/B Testing Statistical Framework

A/B Testing Statistical Framework

Key Features

Quick Start

Usage Examples

Bayesian Analysis

Sequential Testing with Early Stopping

Segment-Level Breakdown

Configuration

Best Practices

Troubleshooting

Requirements

Related Articles

Top comments (0)