Dmitry Romanoff

Posted on Jun 1

📊 Predicting Stock Price Movement Using Options Data in Python

#python #programming #howto #tutorial

Have you ever wanted to use real stock options data to estimate a stock's expected move and find the "max pain" point for option sellers? In this post, we'll walk through a Python script that does just that — using basic options CSV data and some neat tricks with pandas and numpy.

🧠 What's the Idea?

Options trading isn't just speculation — it also contains valuable information about market expectations. Using open interest and implied volatility, we can estimate:

The expected price range of a stock by expiration.
The Max Pain strike — the price where most option holders (especially retail) lose money.

🧾 What You'll Need

Two CSV files: calls.csv and puts.csv, each space-delimited.
Python 3 with pandas and numpy installed.

🧬 Sample Input Data

🟦 `calls.csv` (truncated)

Contract Name   Last Trade Date (EDT)   Strike  Last Price  Bid Ask Change  % Change    Volume  Open Interest   Implied Volatility
LLY250815C00350000  ... 350 410.00  388.40  395.85  ... 1   1   98.14%
LLY250815C00400000  ... 400 343.46  351.85  356.90  ... 112 58  114.91%

🟥 `puts.csv` (truncated)

Contract Name   Last Trade Date (EDT)   Strike  Last Price  Bid Ask Change  % Change    Volume  Open Interest   Implied Volatility
LLY250815P00350000  ... 350 1.51    0.03    1.80    ... 1   31  75.27%
LLY250815P00400000  ... 400 1.02    0.32    2.00    ... 1   68  65.16%

🧑‍💻 Python Code Walkthrough

import pandas as pd
import numpy as np

def clean_options_data(filepath):
    # Automatically infer whitespace columns and clean headers
    df = pd.read_csv(filepath, sep=None, engine='python')
    df.columns = df.columns.str.strip()
    df.replace('-', np.nan, inplace=True)

    # Process Implied Volatility
    if 'Implied Volatility' not in df.columns:
        print("❌ 'Implied Volatility' column not found. Check headers.")
        return None

    df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False)
    df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100

    df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce')
    df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce')

    df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True)
    return df

This function loads and sanitizes the options data, ensuring percentages and missing values are properly handled.

📈 Estimate Expected Move

calls = clean_options_data("calls.csv")
puts = clean_options_data("puts.csv")

# Exit on error
if calls is None or puts is None:
    print("Fix your CSV headers and retry.")
    exit()

# ATM (At-the-Money) strike based on highest call open interest
atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike']
atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean()
days_to_expiry = 75  # Use actual days to expiry

expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365)

print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}")
print(f"📈 Expected ± Move in {days_to_expiry} days: ${expected_move:.2f}")
print(f"🔍 Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}")

🎯 Calculate Max Pain

The Max Pain price is where the total losses (for put and call holders) are minimized — this is often where market makers benefit most.

def max_pain(calls_df, puts_df):
    strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike'])))
    total_pain = []

    for strike in strikes:
        call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum()
        put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum()
        total_pain.append((strike, call_pain + put_pain))

    pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain'])
    return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike']

max_pain_strike = max_pain(calls, puts)
print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}")

🧾 Output Example

✅ Parsed headers: ['Contract Name', ..., 'Implied Volatility']

🧠 Estimated Stock Price: $400.00
📈 Expected ± Move in 75 days: $49.73
🔍 Price Range: $350.27 to $449.73

🎯 Max Pain Strike: $390.00

🔍 Takeaways

You can estimate market expectations using just open interest and implied volatility.
Max Pain is a powerful idea and sometimes a magnet for stock price behavior near expiry.
pandas makes it incredibly easy to clean and analyze tabular data — even when messy.

🚀 What’s Next?

You could extend this to:

Plot open interest by strike.
Animate price move cones.
Combine with real-time price feeds using yfinance or alphavantage.

📂 Repo Starter

Create a folder like:

/stocks_b
  ├─ calls.csv
  ├─ puts.csv
  └─ analyze_options.py

Then run:

python analyze_options.py

Full Code

import pandas as pd
import numpy as np

def clean_options_data(filepath):
    # Use Python engine to preserve multi-word headers like "Implied Volatility"
    df = pd.read_csv(filepath, sep=None, engine='python')

    # Strip any extra whitespace in headers
    df.columns = df.columns.str.strip()

    # Optional: print to confirm correct headers
    print("✅ Parsed headers:", df.columns.tolist())

    # Replace '-' with NaN
    df.replace('-', np.nan, inplace=True)

    # Clean Implied Volatility
    if 'Implied Volatility' not in df.columns:
        print("❌ 'Implied Volatility' column not found. Check headers again.")
        return None

    df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False)
    df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100

    # Convert other numeric columns
    df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce')
    df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce')

    # Drop rows missing key values
    df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True)

    return df

# Load data
calls = clean_options_data("calls.csv")
puts = clean_options_data("puts.csv")

# Stop if headers are invalid
if calls is None or puts is None:
    print("Fix your CSV headers and retry.")
    exit()

# Estimate ATM strike and IV
atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike']
atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean()

# Estimate move
days_to_expiry = 75
expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365)

print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}")
print(f"📈 Expected ± Move in {days_to_expiry} days: ${expected_move:.2f}")
print(f"🔍 Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}")

# Max Pain
def max_pain(calls_df, puts_df):
    strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike'])))
    total_pain = []

    for strike in strikes:
        call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum()
        put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum()
        total_pain.append((strike, call_pain + put_pain))

    pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain'])
    return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike']

max_pain_strike = max_pain(calls, puts)
print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}")

DEV Community

📊 Predicting Stock Price Movement Using Options Data in Python

🧠 What's the Idea?

🧾 What You'll Need

🧬 Sample Input Data

🟦 `calls.csv` (truncated)

🟥 `puts.csv` (truncated)

🧑‍💻 Python Code Walkthrough

📈 Estimate Expected Move

🎯 Calculate Max Pain

🧾 Output Example

🔍 Takeaways

🚀 What’s Next?

📂 Repo Starter

Full Code

Top comments (0)

🧠 What's the Idea?

🧾 What You'll Need

🧬 Sample Input Data

🟦 calls.csv (truncated)

🟥 puts.csv (truncated)

🧑‍💻 Python Code Walkthrough

📈 Estimate Expected Move

🎯 Calculate Max Pain

🧾 Output Example

🔍 Takeaways

🚀 What’s Next?

📂 Repo Starter

Full Code

🟦 `calls.csv` (truncated)

🟥 `puts.csv` (truncated)