The volatility risk premium (VRP) is the gap between what the options market prices in and what the underlying actually delivers. The formula fits in one line. The interesting part is in the inputs, the unit conversions, and a handful of mistakes that produce a plausible-looking but wrong number.
This post walks through the math, a worked SPY example, a self-contained Python implementation in 30 lines, and the five pitfalls I see in almost every homemade implementation.
TL;DR
VRP = sigma_IV - sigma_RV # vol units
VRP_var = sigma_IV**2 - sigma_RV**2 # variance units
Where:
-
sigma_IVis annualized implied vol (typically 30-day ATM IV) -
sigma_RVis annualized realized vol over a matched window
If 30-day ATM IV is 18% and 30-day realized vol is 12%, VRP is +6 vol points. Positive on ~70 to 75% of trading days across major US equity indices over the last 30 years. That spread is the structural edge premium sellers harvest.
The formula is the easy part. Keep reading.
Vol vs variance: same idea, two units
| Term | Formula | Units |
|---|---|---|
| Volatility risk premium | σ_IV − σ_RV | Annualized vol points |
| Variance risk premium | σ_IV² − σ_RV² | Annualized variance |
| Variance swap rate | Model-free integral over the strip | Variance |
For most practitioner work, vol units are easier to reason about. For variance swaps and academic papers (Carr-Wu, Bollerslev), variance units are more rigorous. The rest of this post uses vol units. Square both inputs to translate.
The four inputs that actually matter
1. Which IV
"Implied volatility" is not one number. Every strike at every expiration has its own IV. The standard choice for VRP work is 30-day ATM IV, interpolated from the listed expirations that bracket 30 days.
- 30 days matches the VIX horizon (comparability).
- ATM is the least skewed point on the surface (central expectation, not tail demand).
- Interpolation smooths weekend / earnings artifacts.
2. Which RV estimator
Three commonly used estimators, and they disagree by 1 to 3 vol points on the same day for the same window:
| Estimator | Inputs | Bias |
|---|---|---|
| Close-to-close | Daily closes | Underestimates (no overnight gaps, no range) |
| Parkinson | High, low | Captures range, misses gaps |
| Yang-Zhang | OHLC + prior close | Lowest bias, the right default |
This post uses close-to-close to keep the math readable. Production should use Yang-Zhang.
3. Window length and annualization
- Match windows. 30-day IV pairs with 20 to 30 day RV.
- Annualize daily SD by multiplying by
sqrt(252). Forgetting this is the single most common error. - Trading days, not calendar days. Using 365 inflates RV by ~16%.
4. Z-score lookback
A 4% VRP is wide in a low-vol year and narrow in a high-vol year. Normalize:
z = (vrp_today - vrp_252d_mean) / vrp_252d_std
z > 1 is the rich-premium tier. z < -0.5 is do-not-sell.
Worked example: SPY, by hand
Lock the formula in on paper before touching code.
1. 30-day ATM IV interpolated from the surface: 0.174 (17.4%)
2. Last 21 SPY closes, log returns:
r_t = ln(P_t / P_{t-1})
Standard deviation of the 20 daily log returns: 0.00692
3. Annualize:
0.00692 * sqrt(252) = 0.00692 * 15.875 = 0.1099 = 10.99%
4. Subtract:
VRP = 17.4% - 10.99% = +6.41%
Variance form: 0.174**2 - 0.1099**2 = 0.01820.
Four arithmetic steps. Now the code.
Python in 30 lines
import numpy as np
import pandas as pd
import yfinance as yf
def compute_vrp(symbol: str, atm_iv_30d: float, rv_window: int = 20) -> dict:
"""
Compute volatility risk premium for `symbol`.
Parameters
----------
symbol : underlying ticker, e.g. "SPY"
atm_iv_30d : 30-day ATM implied vol, annualized, decimal (0.18 = 18%)
rv_window : trailing trading-day window for RV (default 20)
"""
hist = yf.download(symbol, period="60d", progress=False, auto_adjust=False)
closes = hist["Close"].dropna().tail(rv_window + 1)
if len(closes) < rv_window + 1:
raise ValueError(f"Not enough history for {symbol}")
log_returns = np.log(closes / closes.shift(1)).dropna()
rv = float(log_returns.std()) * np.sqrt(252)
vrp_vol = atm_iv_30d - rv
vrp_var = atm_iv_30d**2 - rv**2
return {
"symbol": symbol,
"atm_iv_30d": atm_iv_30d,
"rv_window": rv_window,
"rv": rv,
"vrp_vol_points": vrp_vol,
"vrp_variance": vrp_var,
}
if __name__ == "__main__":
result = compute_vrp("SPY", atm_iv_30d=0.174, rv_window=20)
print(f"ATM IV (30d): {result['atm_iv_30d']:.2%}")
print(f"RV (20d): {result['rv']:.2%}")
print(f"VRP (vol): {result['vrp_vol_points'] * 100:+.2f} vol pts")
print(f"VRP (var): {result['vrp_variance']:+.5f}")
Three implementation notes:
-
np.log(closes / closes.shift(1))is the canonical log-return idiom. - The
sqrt(252)is the only unit conversion. Drop it and your VRP is meaningless. - This uses close-to-close. Swapping in Yang-Zhang only changes the
rv = ...line.
The one input you cannot get from yfinance is the ATM IV. Either compute it from a chain (interpolate two listed expirations bracketing 30 days, take the ATM strike at each, time-weight) or pull it from a vol-surface API.
The five pitfalls
1. Forgetting sqrt(252)
Daily SD is around 0.005 to 0.015 for an index ETF. ATM IV is 0.10 to 0.30. Subtracting them looks wrong but produces a small negative number that is "in the right range." It is not in the right range. It is in different units.
Fix: Multiply daily SD by sqrt(252). First sanity check on any RV figure: is it the same order of magnitude as the IV?
2. Mismatched windows
30-day IV with 5-day RV swings violently with recent price action. The IV forecasts a month. The RV summarizes a week. The spread is mostly window-mismatch noise.
Fix: Match windows. 30-day IV pairs with 20 to 30 day RV.
3. Trading the raw spread
A 3% VRP at VIX 12 is wide. The same 3% at VIX 25 is narrow. Acting on the raw number ignores the regime.
Fix: Use the rolling 252-day z-score, not the raw number.
4. Close-to-close on gap-heavy names
Single stocks with regular earnings, ADRs, post-event reactions: close-to-close systematically understates RV. Apparent VRP looks rich. Real edge is smaller.
Fix: Yang-Zhang for any name with non-trivial overnight moves.
5. Lookahead bias in backtests
The classic. If you compute percentile thresholds from the entire history and apply them to a "sell when VRP is rich" rule, you are using future data to set today's threshold. Looks brilliant in-sample, falls apart out-of-sample.
Fix: Compute z-scores and percentiles using only data available at the trade-decision time:
df["vrp_z"] = (
(df["vrp"] - df["vrp"].rolling(252).mean())
/ df["vrp"].rolling(252).std()
)
rolling() is point-in-time by construction. Do not use full-sample mean and SD anywhere in your backtest pipeline.
From number to trade
Computing VRP is half the job. The sizing rule is the other half:
| VRP z-score | Action |
|---|---|
| z > 1.5 | Aggressive selling, full size |
| 0.5 to 1.5 | Standard sizing |
| −0.5 to 0.5 | Reduced size, defined risk only |
| z < −0.5 | Sit out or hedge |
Three questions decide the structure once z says go:
- Where is the edge directional? Put-side and call-side VRP are rarely equal.
- What is the dealer regime? 4% VRP in positive gamma is a different trade from the same 4% in negative gamma.
- Which structure fits? Short straddle, iron condor, put credit spread, jade lizard, calendar.
When to stop rolling your own
Computing VRP for one symbol on one date: 30 lines of Python.
Computing it correctly across many symbols and dates, with the directional decomposition, GEX-conditioned regime, and leak-free historical percentiles a real backtest needs: an infrastructure project that adds zero alpha.
I built FlashAlpha for this. GET https://lab.flashalpha.com/v1/vrp/{symbol} returns ATM IV, four matched RV windows (5d, 10d, 20d, 30d) using Yang-Zhang, the VRP for each, rolling z-score and percentile, put-call directional decomposition, GEX regime, and strategy suitability scores. The historical endpoint at historical.flashalpha.com/v1/vrp/{symbol}?at=<timestamp> returns the same shape with point-in-time z-scores for lookahead-free backtesting.
Free tier, no card. Useful as a sanity check against your own implementation:
import requests
resp = requests.get(
"https://lab.flashalpha.com/v1/vrp/SPY",
headers={"X-Api-Key": "YOUR_KEY"},
)
vrp = resp.json()["vrp"]
print(f"ATM IV: {vrp['atm_iv']:.2f}%")
print(f"RV (20d): {vrp['rv_20d']:.2f}%")
print(f"VRP (20d): {vrp['vrp_20d']:+.2f}%")
print(f"Z-score: {vrp['z_score']:+.2f}")
Run your homemade compute_vrp against the 20-day RV and compare to vrp_20d. Within 0.1 vol point, your math and inputs are aligned. Off by more, work back through the five pitfalls.
Wrap
The VRP formula is a one-liner. Almost every homemade implementation gets the formula right and fails on at least one of: annualization, window matching, RV estimator choice, or leak-free percentile computation.
Compute it by hand once. Code it in 30 lines. Then either keep building the infrastructure or stop and use a feed. Either is fine. Mixing the two (homemade VRP with leaky percentiles fed into a "production" backtest) is the worst of both.
If you want the full systematic workflow (signal to execution to structure choice), the practical guide to trading VRP covers it end to end.
Top comments (0)