tomasz dobrowolski

Posted on May 10 • Originally published at flashalpha.com

Volatility Risk Premium in 30 Lines of Python (And the 5 Mistakes That Wreck It)

#python #finance #quant #datascience

The volatility risk premium (VRP) is the gap between what the options market prices in and what the underlying actually delivers. The formula fits in one line. The interesting part is in the inputs, the unit conversions, and a handful of mistakes that produce a plausible-looking but wrong number.

This post walks through the math, a worked SPY example, a self-contained Python implementation in 30 lines, and the five pitfalls I see in almost every homemade implementation.

TL;DR

VRP = sigma_IV - sigma_RV   # vol units
VRP_var = sigma_IV**2 - sigma_RV**2   # variance units

Where:

sigma_IV is annualized implied vol (typically 30-day ATM IV)
sigma_RV is annualized realized vol over a matched window

If 30-day ATM IV is 18% and 30-day realized vol is 12%, VRP is +6 vol points. Positive on ~70 to 75% of trading days across major US equity indices over the last 30 years. That spread is the structural edge premium sellers harvest.

The formula is the easy part. Keep reading.

Vol vs variance: same idea, two units

Term	Formula	Units
Volatility risk premium	σ_IV − σ_RV	Annualized vol points
Variance risk premium	σ_IV² − σ_RV²	Annualized variance
Variance swap rate	Model-free integral over the strip	Variance

For most practitioner work, vol units are easier to reason about. For variance swaps and academic papers (Carr-Wu, Bollerslev), variance units are more rigorous. The rest of this post uses vol units. Square both inputs to translate.

The four inputs that actually matter

1. Which IV

"Implied volatility" is not one number. Every strike at every expiration has its own IV. The standard choice for VRP work is 30-day ATM IV, interpolated from the listed expirations that bracket 30 days.

30 days matches the VIX horizon (comparability).
ATM is the least skewed point on the surface (central expectation, not tail demand).
Interpolation smooths weekend / earnings artifacts.

2. Which RV estimator

Three commonly used estimators, and they disagree by 1 to 3 vol points on the same day for the same window:

Estimator	Inputs	Bias
Close-to-close	Daily closes	Underestimates (no overnight gaps, no range)
Parkinson	High, low	Captures range, misses gaps
Yang-Zhang	OHLC + prior close	Lowest bias, the right default

This post uses close-to-close to keep the math readable. Production should use Yang-Zhang.

3. Window length and annualization

Match windows. 30-day IV pairs with 20 to 30 day RV.
Annualize daily SD by multiplying by sqrt(252). Forgetting this is the single most common error.
Trading days, not calendar days. Using 365 inflates RV by ~16%.

4. Z-score lookback

A 4% VRP is wide in a low-vol year and narrow in a high-vol year. Normalize:

z = (vrp_today - vrp_252d_mean) / vrp_252d_std

z > 1 is the rich-premium tier. z < -0.5 is do-not-sell.

Worked example: SPY, by hand

Lock the formula in on paper before touching code.

1. 30-day ATM IV interpolated from the surface: 0.174 (17.4%)

2. Last 21 SPY closes, log returns:

r_t = ln(P_t / P_{t-1})

Standard deviation of the 20 daily log returns: 0.00692

3. Annualize:

0.00692 * sqrt(252) = 0.00692 * 15.875 = 0.1099 = 10.99%

4. Subtract:

VRP = 17.4% - 10.99% = +6.41%

Variance form: 0.174**2 - 0.1099**2 = 0.01820.

Four arithmetic steps. Now the code.

Python in 30 lines

import numpy as np
import pandas as pd
import yfinance as yf


def compute_vrp(symbol: str, atm_iv_30d: float, rv_window: int = 20) -> dict:
    """
    Compute volatility risk premium for `symbol`.

    Parameters
    ----------
    symbol      : underlying ticker, e.g. "SPY"
    atm_iv_30d  : 30-day ATM implied vol, annualized, decimal (0.18 = 18%)
    rv_window   : trailing trading-day window for RV (default 20)
    """
    hist = yf.download(symbol, period="60d", progress=False, auto_adjust=False)
    closes = hist["Close"].dropna().tail(rv_window + 1)

    if len(closes) < rv_window + 1:
        raise ValueError(f"Not enough history for {symbol}")

    log_returns = np.log(closes / closes.shift(1)).dropna()
    rv = float(log_returns.std()) * np.sqrt(252)

    vrp_vol = atm_iv_30d - rv
    vrp_var = atm_iv_30d**2 - rv**2

    return {
        "symbol": symbol,
        "atm_iv_30d": atm_iv_30d,
        "rv_window": rv_window,
        "rv": rv,
        "vrp_vol_points": vrp_vol,
        "vrp_variance": vrp_var,
    }


if __name__ == "__main__":
    result = compute_vrp("SPY", atm_iv_30d=0.174, rv_window=20)
    print(f"ATM IV (30d):  {result['atm_iv_30d']:.2%}")
    print(f"RV (20d):      {result['rv']:.2%}")
    print(f"VRP (vol):     {result['vrp_vol_points'] * 100:+.2f} vol pts")
    print(f"VRP (var):     {result['vrp_variance']:+.5f}")

Three implementation notes:

np.log(closes / closes.shift(1)) is the canonical log-return idiom.
The sqrt(252) is the only unit conversion. Drop it and your VRP is meaningless.
This uses close-to-close. Swapping in Yang-Zhang only changes the rv = ... line.

The one input you cannot get from yfinance is the ATM IV. Either compute it from a chain (interpolate two listed expirations bracketing 30 days, take the ATM strike at each, time-weight) or pull it from a vol-surface API.

The five pitfalls

1. Forgetting `sqrt(252)`

Daily SD is around 0.005 to 0.015 for an index ETF. ATM IV is 0.10 to 0.30. Subtracting them looks wrong but produces a small negative number that is "in the right range." It is not in the right range. It is in different units.

Fix: Multiply daily SD by sqrt(252). First sanity check on any RV figure: is it the same order of magnitude as the IV?

2. Mismatched windows

30-day IV with 5-day RV swings violently with recent price action. The IV forecasts a month. The RV summarizes a week. The spread is mostly window-mismatch noise.

Fix: Match windows. 30-day IV pairs with 20 to 30 day RV.

3. Trading the raw spread

A 3% VRP at VIX 12 is wide. The same 3% at VIX 25 is narrow. Acting on the raw number ignores the regime.

Fix: Use the rolling 252-day z-score, not the raw number.

4. Close-to-close on gap-heavy names

Single stocks with regular earnings, ADRs, post-event reactions: close-to-close systematically understates RV. Apparent VRP looks rich. Real edge is smaller.

Fix: Yang-Zhang for any name with non-trivial overnight moves.

5. Lookahead bias in backtests

The classic. If you compute percentile thresholds from the entire history and apply them to a "sell when VRP is rich" rule, you are using future data to set today's threshold. Looks brilliant in-sample, falls apart out-of-sample.

Fix: Compute z-scores and percentiles using only data available at the trade-decision time:

df["vrp_z"] = (
    (df["vrp"] - df["vrp"].rolling(252).mean())
    / df["vrp"].rolling(252).std()
)

rolling() is point-in-time by construction. Do not use full-sample mean and SD anywhere in your backtest pipeline.

From number to trade

Computing VRP is half the job. The sizing rule is the other half:

VRP z-score	Action
z > 1.5	Aggressive selling, full size
0.5 to 1.5	Standard sizing
−0.5 to 0.5	Reduced size, defined risk only
z < −0.5	Sit out or hedge

Three questions decide the structure once z says go:

Where is the edge directional? Put-side and call-side VRP are rarely equal.
What is the dealer regime? 4% VRP in positive gamma is a different trade from the same 4% in negative gamma.
Which structure fits? Short straddle, iron condor, put credit spread, jade lizard, calendar.

When to stop rolling your own

Computing VRP for one symbol on one date: 30 lines of Python.

Computing it correctly across many symbols and dates, with the directional decomposition, GEX-conditioned regime, and leak-free historical percentiles a real backtest needs: an infrastructure project that adds zero alpha.

I built FlashAlpha for this. GET https://lab.flashalpha.com/v1/vrp/{symbol} returns ATM IV, four matched RV windows (5d, 10d, 20d, 30d) using Yang-Zhang, the VRP for each, rolling z-score and percentile, put-call directional decomposition, GEX regime, and strategy suitability scores. The historical endpoint at historical.flashalpha.com/v1/vrp/{symbol}?at=<timestamp> returns the same shape with point-in-time z-scores for lookahead-free backtesting.

Free tier, no card. Useful as a sanity check against your own implementation:

import requests

resp = requests.get(
    "https://lab.flashalpha.com/v1/vrp/SPY",
    headers={"X-Api-Key": "YOUR_KEY"},
)
vrp = resp.json()["vrp"]

print(f"ATM IV:     {vrp['atm_iv']:.2f}%")
print(f"RV (20d):   {vrp['rv_20d']:.2f}%")
print(f"VRP (20d):  {vrp['vrp_20d']:+.2f}%")
print(f"Z-score:    {vrp['z_score']:+.2f}")

Run your homemade compute_vrp against the 20-day RV and compare to vrp_20d. Within 0.1 vol point, your math and inputs are aligned. Off by more, work back through the five pitfalls.

Wrap

The VRP formula is a one-liner. Almost every homemade implementation gets the formula right and fails on at least one of: annualization, window matching, RV estimator choice, or leak-free percentile computation.

Compute it by hand once. Code it in 30 lines. Then either keep building the infrastructure or stop and use a feed. Either is fine. Mixing the two (homemade VRP with leaky percentiles fed into a "production" backtest) is the worst of both.

If you want the full systematic workflow (signal to execution to structure choice), the practical guide to trading VRP covers it end to end.

DEV Community

Volatility Risk Premium in 30 Lines of Python (And the 5 Mistakes That Wreck It)

TL;DR

Vol vs variance: same idea, two units

The four inputs that actually matter

1. Which IV

2. Which RV estimator

3. Window length and annualization

4. Z-score lookback

Worked example: SPY, by hand

Python in 30 lines

The five pitfalls

1. Forgetting `sqrt(252)`

2. Mismatched windows

3. Trading the raw spread

4. Close-to-close on gap-heavy names

5. Lookahead bias in backtests

From number to trade

When to stop rolling your own

Wrap

Top comments (0)

TL;DR

Vol vs variance: same idea, two units

The four inputs that actually matter

1. Which IV

2. Which RV estimator

3. Window length and annualization

4. Z-score lookback

Worked example: SPY, by hand

Python in 30 lines

The five pitfalls

1. Forgetting sqrt(252)

2. Mismatched windows

3. Trading the raw spread

4. Close-to-close on gap-heavy names

5. Lookahead bias in backtests

From number to trade

When to stop rolling your own

Wrap

1. Forgetting `sqrt(252)`