tomasz dobrowolski

Posted on May 21 • Originally published at flashalpha.com

VRP Short Put Spreads: An Honest 7-Year Backtest Across 5 Symbols

#backtesting #options #quant #fintech

Framing correction (read first). The thesis of this article ("the edge does not survive honest execution") is overstated. A controlled follow-up, VRP Backtest: The Fill Model Is the Edge, shows the result is bounded by the fill-model assumption: honest fills give breakeven to negative (these numbers); idealized mid-fills (the universal public-backtest default) give strongly positive. The correct conclusion is execution-fragility and a result-range straddling zero, not "VRP is dead." Read this piece as the pessimistic bound of that range.

Search "volatility risk premium strategy" and you will find a hundred posts with an equity curve going up and to the right and a win rate north of 80%. They are almost all built on three quiet lies: mid-price fills, clean profit-target exits, and parameters chosen on the same symbol they are advertised on. This study removes all three.

We took a single short-put-spread VRP harvest, tuned it once on SPY in prior work, froze every parameter, and ran it unchanged across QQQ, IWM, AMZN, and NVDA from 2019 to 2026. SPXW was added as a flagship out-of-sample check via a completely separate daily-resolution data path. Execution was modeled the way a real account experiences it: post-and-wait limit orders, exits that have to cross the spread, a stale-quote guard, and a signal that can only see the past.

TL;DR

Symbol	Trades	Win rate	Profit factor	Sharpe	CAGR %	MaxDD %	Fill rate
SPY (tuned)	335	71.6%	1.10	0.23	1.05	6.48	13.7%
NVDA	201	72.1%	1.12	0.22	0.84	6.87	6.5%
AMZN	139	71.2%	1.08	0.13	0.42	7.38	3.4%
QQQ	162	69.8%	0.96	−0.08	−0.23	7.76	7.9%
IWM	58	63.8%	0.72	−0.35	−0.11	1.24	3.1%
SPXW †	112	61.6%	0.37	−0.39	−0.66	5.81	100% †

Period: 2019-02-01 to 2026-04-02 (7.16 years). $100k notional, unlevered Half-Kelly.
† SPXW is sourced at daily-close resolution from the Historical API; fill rate is 100% by construction. Direction check only.

Win rate clusters at 64-72% across every symbol. The VRP probability edge is real and travels. Sharpe ranges from +0.23 to −0.39. Two of four out-of-sample symbols lost money. SPXW lost money. The premium is real; the tradeable edge is not.

The strategy (frozen specification)

A short put credit spread on the underlying, gated by a market-regime VRP signal. Every number below was fixed before the cross-sectional run and never changed per symbol.

Structure: short put vertical spread (defined risk)
Short leg target: 0.10 delta (far-wing premium)
DTE target: 14 days (±2)
PT / SL: 50% / 100% of credit
Spread widths: 5 / 10 / 15 / 20 / 25 / 30 points (EV-ranked at entry from observable greeks only)
Sizing: Half-Kelly, default 0.05, cap 0.25 (unlevered)
Entry time: 10:05 ET
Regime gate: one SPY/VIX market-regime signal applied to all five symbols (this is what makes the four extra symbols a genuine OOS test rather than four fitted strategies)
Circuit breaker: 30% peak-to-trough

The regime signal is built from VIX, VIX9D, VVIX, the VIX term structure, a realized-vs-implied VRP proxy and its 252-day percentile, HY spreads, and stress z-scores. It outputs risk_on / neutral / reduce / risk_off with a continuous Kelly multiplier. risk_off means no trade. Construction is leak-free; methodology in Historical VRP Percentiles Without Lookahead Bias.

The honesty controls

Each control below removes a specific way backtests lie. This is the entire point.

1. Leak-free signal. The signal CSV is shifted forward one calendar day with a 7-day walk-back lookup. Any trading day's decision uses only data fully observable before that day's open.

2. Honest fills (post-and-wait limits). We do not fill at mid. We post a limit at ask_edge and wait for someone else to cross our price. If nobody does, the order is cancelled. That is why fill rates are 3-14% and not 100%.

3. Stale-quote guard. When a cross is detected, we re-check that the mid has not moved through our limit by more than a floor. A one-tick bid blip during a vol spike does not get to manufacture a phantom fill.

4. Patient-then-cross exits. Profit-target and stop-loss exits are not free. We post a buy-to-close limit at the trigger and wait. If it does not fill, we cross the spread and pay the offer. Those exits are tagged pt_x / sl_x so the execution tax is auditable.

5. EV-blind tiebreak. When multiple candidate spreads cross on the same bar, the winner is chosen by a timestamp-seeded random shuffle, never by best EV. Any EV-aware tiebreak is a look-ahead oracle.

6. Frozen, never re-fit. Tuned once on SPY in prior work. Then frozen. QQQ, IWM, AMZN, NVDA, SPXW received zero per-symbol optimization.

Why a 70% win rate nets to nothing

Symbol	Avg win	Avg loss	Win:loss
SPY	$369	−$850	1 : 2.3
QQQ	$311	−$751	1 : 2.4
AMZN	$407	−$933	1 : 2.3
NVDA	$385	−$888	1 : 2.3
IWM	$53	−$130	1 : 2.4

A 70% win rate at a 1:2.3 payoff has expected value 0.70 × 1 − 0.30 × 2.3 ≈ 0. By arithmetic, it is a coin flip. This is the defining feature of short-vol: you win small, often, and lose big, occasionally. The win rate is the wrong number to anchor on.

Only 3-14% of orders filled, and the fills were adversely selected

Symbol	Proposed	Filled	Fill rate	Avg edge captured
SPY	2,438	335	13.7%	−$0.037
QQQ	2,055	162	7.9%	−$0.042
NVDA	3,072	201	6.5%	−$0.044
AMZN	4,114	139	3.4%	−$0.040
IWM	1,896	58	3.1%	−$0.039

A mid-fill backtest books all of the proposed trades at a better price. Reality books a single-digit-to-low-teens percentage of them and the ones that do fill cross at ~$0.04 worse than mid. The orders that fill are disproportionately the ones the market is running through (adverse selection). On 100-multiplier contracts that is a structural ~$4 headwind per fill before the trade begins.

The "profit target" is mostly a forced spread cross

pt = clean limit fill at target. pt_x = target hit but had to cross the spread.

Symbol	pt (clean)	pt_x (crossed)	sl	sl_x	expiry
SPY	103	133	63	32	4
QQQ	29	76	37	12	8
AMZN	44	47	25	14	9
NVDA	68	71	43	13	6
IWM	12	21	7	13	5

On every symbol, more profit-target exits required crossing the spread than filled cleanly at the limit. The idealized "close at 50% target" that naive backtests book is, in practice, the minority outcome. This single effect is a primary driver of the gap between the win rate and the Sharpe.

The high-conviction regime did not earn its name

The signal's risk_on ("deploy full size") regime underperformed neutral on four of five 1-minute symbols:

Symbol	`neutral` P&L (win%)	`risk_on` P&L (win%)
SPY	+$5,604 (73%)	+$2,174 (70%)
QQQ	−$3,000 (69%)	+$1,364 (71%)
NVDA	+$5,918 (75%)	+$266 (69%)
AMZN	+$4,669 (70%)	−$1,644 (73%)
IWM	+$145 (72%)	−$910 (60%)

A signal that sizes up into its highest-conviction state and earns less there is a signal whose conviction is, at best, uncorrelated with forward edge. The same pattern reappears on SPXW from a completely separate data feed (95 risk_on trades, −$5,266; 17 neutral trades, +$625). This is the precise mechanism by which leveraged-VRP posts blow up.

SPXW: the flagship out-of-sample check

SPXW is the canonical home of the VRP trade. The strategy was never tuned on it, and it had to be sourced from a different data path entirely (Historical API, daily-close resolution).

Metric	SPXW (daily, OOS)
Trades	112 over 7.16 yr
Win rate	61.6%
Profit factor	0.37
Sharpe	−0.39
Avg win / avg loss	$39 / −$170 (1 : 4.4)
`risk_on` regime	95 trades, −$5,266, 60% win
`neutral` regime	17 trades, +$625, 71% win

It lost money too. The win/loss asymmetry is even worse (1:4.4) because narrow 5-point SPX spreads collect tiny credits against a wide max loss.

Caveat: SPXW transacts at daily-close mid minus the −$0.04 haircut empirically measured on the five 1-minute symbols. Fill rate is 100% by construction and exits are all forced crosses, so its fill rate and Sharpe magnitude are not apples-to-apples with the 1-minute rows. It is cited as a directional confirmation, not a sixth execution-comparable Sharpe.

What this means

VRP is real; the free lunch is not. The premium shows up reliably as a high win rate. It does not reliably show up as money once you (a) pay realistic execution and (b) refuse to re-fit per symbol.
The win rate is the most misleading number in options content. A 70% win rate with a 1:2.3 payoff is structurally breakeven. Any post leading with win rate and not showing the loss distribution is selling you the setup, not the result.
Execution is the strategy. The single biggest gap between the hyped version and this one is fills, not signal. Mid-fill assumptions, clean-target exits, and 100% fill rates are where the fictional returns live.
Out-of-sample is brutal and necessary. SPY (tuned) was the best result. Every symbol the strategy had never seen did worse. Honest deployment is the non-tuned result.
Do not lever the confidence. The risk_on regime, the one a leveraged variant would size into hardest, was the weaker regime.

Reproduction

bash run_cross_symbol.sh                              # 5 symbols, 1-min local chains
python spxw_api_backtest.py --start 2019-02-01 \
       --end 2026-04-02 --label vrpfrozen             # SPXW, daily-close via API
python aggregate_cross_symbol.py                      # cross_symbol_summary.{csv,json}

Per-symbol engine invocation (identical except --symbol):

python intraday_bt_ev_rank.py --symbol <SYM> \
  --start 2019-02-01 --end 2026-04-02 \
  --delta 0.10 --dte 14 --pt 0.50 --sl 1.0 \
  --vrp-signal vrp_signal_v2.csv \
  --kelly-default 0.05 --kelly-mult 0.5 --vrp-on-mult 1.0 \
  --kelly-max 0.25 --max-drawdown 0.30

The SPXW row consumed the same Historical API surface a paying user would call. Full study, full appendix table, year-by-year breakdowns, and the limitations section live in the original article. The companion fill-model sensitivity study is VRP Backtest: The Fill Model Is the Edge.

DEV Community