YMori

Posted on Feb 25 • Edited on Apr 6 • Originally published at zenn.dev

Why Marcel Beat LightGBM: Building an NPB Player Performance Prediction System

#python #baseball #machinelearning #datascience

What I Built

A system to answer: "How will this NPB player perform next season?"

→ GitHub: https://github.com/yasumorishima/npb-prediction

Two prediction methods:

Marcel projection — a simple statistical method from the 1980s
LightGBM / XGBoost — modern machine learning

The key finding: Marcel outperformed ML on ERA prediction, and the two were essentially tied on OPS prediction.

[March 2026 Correction] I re-evaluated using the same player set (PA≥100 / IP≥30) for both Marcel and ML. The original article overstated Marcel's OPS advantage. Corrected numbers are in the backtest section below.

Key Terms (for first-time readers)

Term	Meaning
Marcel method	A simple projection method using a 3-year weighted average of past stats (weights: 5:4:3, recent years weighted higher)
LightGBM / XGBoost	Gradient boosting tree models — modern machine learning algorithms that learn from features
OPS	On-base Plus Slugging — a composite batting metric (OBP + SLG)
ERA	Earned Run Average — earned runs allowed per 9 innings
MAE	Mean Absolute Error — average prediction miss. Lower is better
Backtest	Validating a prediction model using historical data

Data Sources

Pro Baseball Data Freak (baseball-data.com) — NPB batter/pitcher stats, 2015–2025
NPB Official Site (npb.jp) — detailed batting stats (2B/3B/SF, for wOBA calculation)

Collected via pandas.read_html().

Dataset	Rows
Batter stats	3,780
Pitcher stats	3,773
Standings	132 (12 teams × 11 years)
Player birthdays	2,479
Detailed batting stats	4,538

Data source: Pro Baseball Data Freak / NPB Official

Marcel Projection

Developed by Tom Tango. The implementation is straightforward.

Step 1: Weighted average of past 3 years

Recent seasons matter more — weighted 5/4/3.

weight_map = {0: 5, 1: 4, 2: 3}  # 0=most recent

Step 2: Regression to league mean

Players with fewer plate appearances are pulled toward league average.

regression = 1200 / (pa + 1200)
predicted = (1 - regression) * weighted + regression * league_avg

Step 3: Age adjustment

Peak performance assumed at age 27.

age_factor = 1.0 + (27 - age) * 0.003

Custom wOBA / wRC+ for NPB

Unlike MLB (where Baseball Savant provides wOBA), NPB has no official published wOBA values. I calculated them from NPB official data using league-adjusted weights.

# wOBA (NPB league-adjusted weights)
woba = (
    0.69 * BB + 0.72 * HBP +
    0.89 * H1B + 1.27 * H2B +
    1.62 * H3B + 2.10 * HR
) / (AB + BB + HBP + SF)

2024 wRC+ Top 3 (my calculation)

Player	Team	wOBA	wRC+
Kondo Kensuke	SoftBank	.479	249
Austin	DeNA	.478	248
Santana	Rakuten	.441	220

ML Approach (LightGBM / XGBoost)

Features include age, historical stats, and the wOBA/wRC+ calculated above.

features = [
    'age', 'OPS_prev1', 'OPS_prev2', 'OPS_prev3',
    'woba', 'wrc_plus',
    'PA_prev1', 'PA_prev2'
]

Results: 2025 Backtest

Trained on 2015–2024, tested against 2025 actuals. Marcel and ML are evaluated on the same player set (batters PA≥100, pitchers IP≥30).

Batter OPS MAE (2025, n=172 players)

Method	OPS MAE
Marcel	.063
XGBoost	.063
LightGBM	.066

→ Essentially tied (difference < .001).

Pitcher ERA MAE (2025, n=145 players)

Method	ERA MAE
Marcel	0.78
XGBoost	0.93
LightGBM	0.92

→ Marcel wins (gap ~0.14).

Marcel clearly outperformed ML on ERA prediction. For OPS prediction, the two are virtually identical — a simple method matching a modern ML model in this setting.

Why does this happen?

Player true talent changes slowly (1–2 years)
ML tends to overfit with limited sample sizes
Simple weighted averages fit the actual distribution of year-to-year changes

Player Stories: Where Marcel Shines and Struggles

Austin (DeNA) — A tale of two predictions

Tracking Austin's comeback from 2022–2023 injuries reveals both Marcel's weakness and strength.

2024 prediction (based on 2021–2023 data):

	OPS	PA
Marcel prediction	.818	145
2024 actual	.983	445
Error	.165	—

With only 38 PA in 2022 and 54 PA in 2023 due to injuries, Marcel regressed heavily toward league average. It completely missed the OPS .983 comeback.

2025 prediction (based on 2022–2024 data):

	OPS	PA
Marcel prediction	.842	213
2025 actual	.834	246
Error	.008	—

The very next year, Marcel nailed it with an error of just .008. With the strong 2024 season now in the data, regression worked in the opposite direction — pulling an unsustainably high OPS back down to a realistic level.

Same player, same method, wildly different accuracy — a perfect illustration of how data availability shapes projection quality.

Tsutsugo Yoshitomo (DeNA) — A comeback Marcel couldn't see

A former NPB star (OPS ~.900 through 2019), Tsutsugo returned from MLB in 2024 and struggled to OPS .683 in just 168 PA.

	OPS	PA
Marcel prediction	.656	168
2025 actual	.876	257
Error	.220	—

Marcel was anchored to his poor 2024 season and predicted continued decline. Instead, Tsutsugo hit 20 home runs and posted OPS .876 — a full-blown resurgence that a weighted-average model simply cannot anticipate. This highlights Marcel's inherent limitation: it struggles with players whose recent performance doesn't reflect their true ability level.

Pythagorean Win Expectation

Predicts team win percentage from runs scored and allowed.

Win% ≈ RS^k / (RS^k + RA^k)

MLB uses k=1.83 as standard. I searched for the NPB-optimal value and found k=1.72.

Exponent	MAE	Sample
NPB optimal (k=1.72)	3.20 wins	All 12 teams, 2015–2025
MLB standard (k=1.83)	3.32 wins	Same

FastAPI Inference API

pip install -r requirements.txt
uvicorn api:app --reload
# Open http://localhost:8000/docs for Swagger UI

Endpoints

Path	Description
`GET /predict/hitter/{name}`	Batter projection (Marcel + ML)
`GET /predict/pitcher/{name}`	Pitcher projection (Marcel + ML)
`GET /predict/team/{name}`	Team Pythagorean win%
`GET /sabermetrics/{name}`	wOBA / wRC+ / wRAA
`GET /rankings/hitters`	Batter rankings
`GET /rankings/pitchers`	Pitcher rankings
`GET /pythagorean`	All teams' Pythagorean win%

Sample response (Maki Shugo, next season projection)

{
  "player": "牧 秀悟",
  "team": "DeNA",
  "marcel": { "OPS": 0.834, "AVG": 0.295, "HR": 22.9, "RBI": 81.4 },
  "ml": { "pred_OPS": 0.874 }
}

Docker support included — docker compose up --build to run.

Deployed on Raspberry Pi 5 + Tailscale Funnel

Beyond running locally, the API is deployed on a Raspberry Pi 5 as a Docker container with restart: unless-stopped. To make it publicly accessible, I used Tailscale Funnel — one command to get a public HTTPS URL:

sudo tailscale funnel --bg 8000

This exposes the FastAPI at a fixed HTTPS URL with no router configuration needed. Tailscale handles certificates, port forwarding, and NAT traversal automatically.

Team Roster Simulation

v0.3.0 adds /simulate/team/{team} — swap players in/out and see how projected wins change.

GET /simulate/team/DeNA?year=2025&add=山川&remove=宮﨑

It adjusts team runs scored by each player's wRAA and recalculates Pythagorean win expectation.

Streamlit Dashboard

All API features are also available through an interactive Streamlit dashboard.

pip install -r requirements.txt
streamlit run streamlit_app.py

7 pages covering batter/pitcher projections, rankings, Pythagorean standings, and team win projections. Charts are built with Plotly, using NPB team colors for all 12 teams. The dashboard supports both Japanese and English.

Batter Rankings: wOBA / wRC+ Sort Options

In addition to OPS/AVG/HR/RBI, you can now sort by wOBA (run value per plate appearance) and wRC+ (batting strength with league average = 100).

Pitcher Rankings: FIP / K% / BB% and More

Beyond ERA/WHIP, pitchers can now be ranked by FIP (fielding-independent pitching), K% (strikeout rate), BB% (walk rate), K-BB% (strikeout minus walk rate), K/9, BB/9, and HR/9.

FIP = (13×HR + 3×(BB+HBP) - 2×SO) / IP + constant C

A pitcher with FIP lower than ERA may be performing better than their results suggest (bad defense behind them), while FIP higher than ERA may indicate defensive support inflating their stats.

Prediction Pages: Formula Explanations

Batter predictions show wOBA/wRC+/wRAA metric cards with a wRC+ trend chart. Pitcher predictions show FIP/K%/BB%/K-BB%/K9/BB9/HR9 cards — K/9, BB/9, and HR/9 now include league average deltas. Each metric includes an expandable formula explanation with benchmark values.

Radar charts are updated: batters use 6 axes (HR/AVG/OBP/SLG/wOBA/wRC+, OPS removed as redundant); pitchers now use 7 axes (ERA/WHIP/SO/K9/BB9/HR9/FIP added).

Top Page: wRC+ / FIP-Sorted TOP3 + Starter vs. Reliever Split

Changed the batter TOP3 on the top page to wRC+ order (from OPS) and pitcher TOP3 to FIP order (from ERA). Both metrics are less influenced by defense, better reflecting true batting/pitching ability.

Pitchers are now split into starters (IP ≥ 100) and relievers (IP 20–99), each showing FIP-sorted TOP3. Since the split is based on innings pitched, some starters with injury-reduced workloads may appear in the reliever section.

Card bar charts are also now aligned with radar charts: batter cards show wOBA/wRC+, pitcher cards show FIP/K9/BB9/HR9.

Current Limitations and Future Plans

Handling New Foreign Players and Rookies

Marcel requires 3 years of NPB data, which means new foreign players, rookies, and players returning from long-term injuries are all excluded from the calculation. Currently, these players are implicitly treated as league-average contributors (wRAA=0).

The dashboard visualizes uncounted players with orange badges and an expander listing each player. I've also implemented prediction ranges (confidence intervals) to show this uncertainty directly on the chart.

✅ Implemented: Prediction ranges (confidence intervals)

Uncounted players are treated as wRAA=0 (league-average contribution), but first-year performance for foreign players varies widely in practice.

The logic:

Historically, first-year NPB foreign players show wRAA ranging from roughly -15 to +25 runs
Baseball's rule of thumb: 10 runs ≈ 1 win (derived from Pythagorean win expectation)
→ Uncertainty per uncounted player ≈ ±1.5 wins

Prediction range = uncounted players × 1.5 wins
Example: 3 uncounted players, 70 projected wins → displayed as "67–74 wins"

The orange error bars on the chart show this range. Teams with more uncounted players have wider bars — a direct visual representation of "this team's actual finish could vary significantly depending on how their new players perform."

✅ Implemented: Data-year badges for players with 1–2 years of NPB data

Separate from completely uncounted players, there's another issue: players with only 1–2 years of NPB data also carry high projection uncertainty. Marcel's regression toward league average gets stronger the fewer data points exist.

# Pitcher with 1 year (60 IP): weighted_ip = 60×5 = 300
# proj_ERA = (player_ERA×300 + league_avg_ERA×600) / 900
#           ≈ 1/3 player + 2/3 league average

1 year of data only → ~2/3 of the projection is pulled toward league average
2 years of data → ~half of the projection is pulled toward league average

A foreign pitcher who played one year for Team A and moved to Team B this season has data, but that data is thin. Their projection coming out near league average doesn't mean they're average — it means the model doesn't know enough yet.

To make this visible, the dashboard now automatically shows "1yr NPB" or "2yr NPB" badges next to player names (125 batters and 180 pitchers are flagged). Treat badged projections as rough estimates and cross-reference with prior season performance.

Remaining work

Approach	Description	Difficulty
Historical average	Use average first-year NPB stats for foreign players	★★☆
League translation factors	Apply MLB/KBO → NPB conversion rates	★★★
Draft position priors	Assign different expected values by draft round	★★☆

Teams with more uncounted players carry higher prediction uncertainty — a team ranked lower by the model may still have significant upside if their new additions outperform historical averages.

Marcel's other blind spot: young player breakouts

Marcel's age adjustment is only +0.3% per year below age 27 — small enough that it can't capture sudden growth. When a 23–26-year-old player is on the verge of a breakout, Marcel pulls their projection back toward their past three-year average, systematically underestimating what they're capable of.

Just like the uncounted-player problem, teams with several young players approaching a breakout are likely underrated by this model. This uncertainty is not currently reflected in the orange confidence interval bars, so the actual gap between prediction and reality may be wider than the chart suggests.

Summary

Item	Detail
Data	baseball-data.com + npb.jp (2015–2025, 5 datasets)
Marcel accuracy (2025)	Batter OPS MAE=.063 / Pitcher ERA MAE=0.78
ML accuracy (2025)	Batter OPS MAE=.063 / Pitcher ERA MAE=0.92
Pythagorean	NPB optimal k=1.72, MAE=3.20 wins
API	FastAPI 8 endpoints, Docker-ready
Dashboard	Streamlit 7 pages, Plotly charts, JA/EN bilingual

The biggest takeaway: newer doesn't always mean better. On pitcher ERA, Marcel outperformed ML. On batter OPS, the two were essentially tied. Marcel — a method from the 1980s — held its own against modern ML on NPB data. Player stories like Austin (error .165 in 2024, then .008 in 2025) and Tsutsugo (error .220) show both the power and limits of any projection system.