Introduction
This article is a continuation of my NPB Bayesian prediction series. Along the way, I reached a conclusion:
"Without tracking data like Statcast, we can't break through the next wall."
In my NPB project, I added Bayesian regression (Stan/Ridge) on top of Marcel projections. At the player level there was consistent improvement (p=0.06), but at the team level the gains disappeared. The reason: Marcel's 3-year weighted average is already accurate for high-PA regulars, leaving no margin for improvement using only aggregate stats like K%/BB%/BABIP.
MLB has Statcast. This article tests whether Statcast tracking features can beat Marcel.
GitHub: https://github.com/yasumorishima/baseball-mlops
Streamlit: https://baseball-mlops.streamlit.app/
What is Marcel?
Marcel is a simple projection system from the 1980s: weighted average of the past 3 years (weights 5:4:3) + regression to the mean + age adjustment. Despite its simplicity, it's remarkably accurate — especially for regular players with large sample sizes.
Data & Features
- Source: pybaseball (FanGraphs + Baseball Savant)
- Target: MLB batters (PA≥100) / pitchers (IP≥30)
- Period: 2015-2024 (training), 2025 (evaluation)
Batter Features (38)
| Category | Features |
|---|---|
| Statcast | EV, Barrel%, xwOBA, Sprint Speed, Launch Angle, EV95% |
| FanGraphs | HardHit%, Contact%, O-Swing%, SwStr% |
| 1-year lag delta | wOBA change, xwOBA change, K% change, BB% change, Barrel% change |
| 2-year trend (v7) | 2-year wOBA direction (rising/falling) |
| Engineered (v7) | age_from_peak (distance from peak age 29), park_factor, team_changed, pa_rate |
| Interaction | age × (xwOBA − wOBA) — luck sensitivity by age |
| Stacking | lgb_delta (LightGBM OOF residual) |
Pitcher Features (35)
| Category | Features |
|---|---|
| Statcast | K%, BB%, Whiff%, CSW%, SwStr%, Barrel%, EV |
| Stuff | Stuff+, Location+, Pitching+, Velo, Spin Rate |
| 1-year lag delta | xFIP change, K% change, BB% change, K-BB% change |
| 2-year trend (v7) | 2-year xFIP direction |
| Engineered (v7) | age_from_peak, park_factor, team_changed, ip_rate, FIP-ERA gap |
| Interaction | age × K-BB% |
| Stacking | lgb_delta |
The park factor work from the NPB series was carried over into baseball-mlops as a park_factor feature — the same methodology, now applied to MLB stadiums.
Model
Three models combined:
- Marcel (baseline): 3-year weighted avg + regression to mean + age adjustment
- LightGBM: Optuna 1000-trial hyperparameter optimization (time-series expanding-window CV)
-
Bayes correction (ElasticNet): Predicts Marcel residuals using Statcast features, adds 80% CI
- Recency Decay: samples weighted by 0.85/year (recent seasons count more)
- LightGBM OOF predictions used as stacking feature
- Ensemble: Marcel×31% + LightGBM×33% + Bayes×36% (auto-weighted by inverse MAE)
Backtest Design
2025 is a strict holdout — never seen by Optuna or CV:
2015-2019: Initial training
2020-2024: Time-series expanding-window CV (Optuna tuning)
2025: Strict holdout (no leakage)
Results
2025 Strict Holdout
| Marcel MAE | ML MAE | Improvement | |
|---|---|---|---|
| Batter wOBA | 0.0331 | 0.0291 | +12.1% |
| Pitcher xFIP | 0.5038 | 0.4837 | +4.0% |
CV results (batter 0.0281 / pitcher 0.521) are consistent with holdout — no overfitting detected.
Year-by-Year Backtest
| Year | Batter ML | Marcel | Pitcher ML | Marcel | ||
|---|---|---|---|---|---|---|
| 2020 | 0.0359 | 0.0371 | ✓ +3.2% | 0.595 | 0.618 | ✓ +3.7% |
| 2021 | 0.0293 | 0.0317 | ✓ +7.6% | 0.542 | 0.553 | ✓ +1.9% |
| 2022 | 0.0296 | 0.0330 | ✓ +10.3% | 0.578 | 0.569 | ✗ -1.5% |
| 2023 | 0.0277 | 0.0303 | ✓ +8.7% | 0.535 | 0.559 | ✓ +4.3% |
| 2024 | 0.0280 | 0.0333 | ✓ +16.0% | 0.509 | 0.522 | ✓ +2.5% |
| 2025 | 0.0291 | 0.0331 | ✓ +12.1% | 0.484 | 0.504 | ✓ +4.0% |
Batters: 6/6 wins. Pitchers: 5/6 wins (2022 loss likely due to limited training data — only COVID-shortened 2020-2021).
Why Does Statcast Help?
The Bayes (ElasticNet) model predicts Marcel's residuals using Statcast features. Larger coefficients = more information Marcel is missing.
Batters
| Feature | Coef | Interpretation |
|---|---|---|
| Max EV | +0.0046 | Peak hitting power — Marcel can't see this |
| Contact% | +0.0040 | Finer skill signal than K% alone |
| BB% | +0.0038 | Additional plate discipline information |
| xwOBA | +0.0037 | Luck-removed true hitting ability |
Pitchers
| Feature | Coef | Interpretation |
|---|---|---|
| Pitching+ | -0.0892 | Overall stuff quality → lower future xFIP |
| K% | -0.0631 | High strikeout rate outperforms Marcel forecast |
| SwStr% | -0.0346 | Swing-and-miss ability |
| Stuff+ | -0.0279 | Velocity + movement + spin combined |
Marcel's ERA/xFIP carries luck components. Statcast's stuff metrics (Stuff+/Pitching+) reflect skill stripped of luck, which is why they add predictive signal.
MLOps Pipeline
Every Monday JST 11:00 (GitHub Actions cron)
↓
fetch_statcast.py (pybaseball → Statcast CSV)
↓
train.py (LightGBM + Optuna 1000 trials + Bayes correction)
↓
W&B Model Registry (MAE comparison → auto-promote "production" tag)
↓
FastAPI (polls W&B every 6h → auto-loads latest model)
The FastAPI server polls W&B every 6 hours and automatically loads the new model when the production tag is updated — no container restart needed.
Looking Ahead: NPB Hawk-Eye
NPB installed Hawk-Eye tracking in all 12 stadiums in 2024. Once data becomes publicly available (expected 2026+), this pipeline can be transplanted directly.
| baseball-mlops | NPB Hawk-Eye version |
|---|---|
| pybaseball | NPB Hawk-Eye API |
| EV / Barrel% / xwOBA | Equivalent metrics |
| MLB Marcel | NPB Marcel |
| LightGBM + Bayes | Same architecture |
Summary
| NPB Bayesian project | baseball-mlops (MLB) | |
|---|---|---|
| Data | K%/BB%/BABIP (aggregate stats) | Statcast (tracking) |
| Marcel improvement | Marginal (p=0.06) | +12.1% (batters) / +4.0% (pitchers) |
| Year-by-year wins | — | Batters 6/6, Pitchers 5/6 |
The reason Statcast works: Marcel's 3-year weighted average can't see contact quality or pitch stuff. Exit velocity, barrel rate, and Stuff+ directly measure those dimensions that aggregate stats miss.
Data: Baseball Savant / FanGraphs via pybaseball
Top comments (0)