DEV Community

Cover image for πŸ“Š Predicting Stock Price Movement Using Options Data in Python
Dmitry Romanoff
Dmitry Romanoff

Posted on

πŸ“Š Predicting Stock Price Movement Using Options Data in Python

Have you ever wanted to use real stock options data to estimate a stock's expected move and find the "max pain" point for option sellers? In this post, we'll walk through a Python script that does just that β€” using basic options CSV data and some neat tricks with pandas and numpy.


🧠 What's the Idea?

Options trading isn't just speculation β€” it also contains valuable information about market expectations. Using open interest and implied volatility, we can estimate:

  • The expected price range of a stock by expiration.
  • The Max Pain strike β€” the price where most option holders (especially retail) lose money.

🧾 What You'll Need

  • Two CSV files: calls.csv and puts.csv, each space-delimited.
  • Python 3 with pandas and numpy installed.

🧬 Sample Input Data

🟦 calls.csv (truncated)

Contract Name   Last Trade Date (EDT)   Strike  Last Price  Bid Ask Change  % Change    Volume  Open Interest   Implied Volatility
LLY250815C00350000  ... 350 410.00  388.40  395.85  ... 1   1   98.14%
LLY250815C00400000  ... 400 343.46  351.85  356.90  ... 112 58  114.91%
Enter fullscreen mode Exit fullscreen mode

πŸŸ₯ puts.csv (truncated)

Contract Name   Last Trade Date (EDT)   Strike  Last Price  Bid Ask Change  % Change    Volume  Open Interest   Implied Volatility
LLY250815P00350000  ... 350 1.51    0.03    1.80    ... 1   31  75.27%
LLY250815P00400000  ... 400 1.02    0.32    2.00    ... 1   68  65.16%
Enter fullscreen mode Exit fullscreen mode

πŸ§‘β€πŸ’» Python Code Walkthrough

import pandas as pd
import numpy as np

def clean_options_data(filepath):
    # Automatically infer whitespace columns and clean headers
    df = pd.read_csv(filepath, sep=None, engine='python')
    df.columns = df.columns.str.strip()
    df.replace('-', np.nan, inplace=True)

    # Process Implied Volatility
    if 'Implied Volatility' not in df.columns:
        print("❌ 'Implied Volatility' column not found. Check headers.")
        return None

    df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False)
    df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100

    df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce')
    df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce')

    df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True)
    return df
Enter fullscreen mode Exit fullscreen mode

This function loads and sanitizes the options data, ensuring percentages and missing values are properly handled.


πŸ“ˆ Estimate Expected Move

calls = clean_options_data("calls.csv")
puts = clean_options_data("puts.csv")

# Exit on error
if calls is None or puts is None:
    print("Fix your CSV headers and retry.")
    exit()

# ATM (At-the-Money) strike based on highest call open interest
atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike']
atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean()
days_to_expiry = 75  # Use actual days to expiry

expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365)

print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}")
print(f"πŸ“ˆ Expected Β± Move in {days_to_expiry} days: ${expected_move:.2f}")
print(f"πŸ” Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}")
Enter fullscreen mode Exit fullscreen mode

🎯 Calculate Max Pain

The Max Pain price is where the total losses (for put and call holders) are minimized β€” this is often where market makers benefit most.

def max_pain(calls_df, puts_df):
    strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike'])))
    total_pain = []

    for strike in strikes:
        call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum()
        put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum()
        total_pain.append((strike, call_pain + put_pain))

    pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain'])
    return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike']

max_pain_strike = max_pain(calls, puts)
print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}")
Enter fullscreen mode Exit fullscreen mode

🧾 Output Example

βœ… Parsed headers: ['Contract Name', ..., 'Implied Volatility']

🧠 Estimated Stock Price: $400.00
πŸ“ˆ Expected Β± Move in 75 days: $49.73
πŸ” Price Range: $350.27 to $449.73

🎯 Max Pain Strike: $390.00
Enter fullscreen mode Exit fullscreen mode

πŸ” Takeaways

  • You can estimate market expectations using just open interest and implied volatility.
  • Max Pain is a powerful idea and sometimes a magnet for stock price behavior near expiry.
  • pandas makes it incredibly easy to clean and analyze tabular data β€” even when messy.

πŸš€ What’s Next?

You could extend this to:

  • Plot open interest by strike.
  • Animate price move cones.
  • Combine with real-time price feeds using yfinance or alphavantage.

πŸ“‚ Repo Starter

Create a folder like:

/stocks_b
  β”œβ”€ calls.csv
  β”œβ”€ puts.csv
  └─ analyze_options.py
Enter fullscreen mode Exit fullscreen mode

Then run:

python analyze_options.py
Enter fullscreen mode Exit fullscreen mode

Predicting Stock Price Movement Using Options Data in Python

Full Code

import pandas as pd
import numpy as np

def clean_options_data(filepath):
    # Use Python engine to preserve multi-word headers like "Implied Volatility"
    df = pd.read_csv(filepath, sep=None, engine='python')

    # Strip any extra whitespace in headers
    df.columns = df.columns.str.strip()

    # Optional: print to confirm correct headers
    print("βœ… Parsed headers:", df.columns.tolist())

    # Replace '-' with NaN
    df.replace('-', np.nan, inplace=True)

    # Clean Implied Volatility
    if 'Implied Volatility' not in df.columns:
        print("❌ 'Implied Volatility' column not found. Check headers again.")
        return None

    df['Implied Volatility'] = df['Implied Volatility'].str.replace('%', '', regex=False)
    df['Implied Volatility'] = pd.to_numeric(df['Implied Volatility'], errors='coerce') / 100

    # Convert other numeric columns
    df['Strike'] = pd.to_numeric(df['Strike'], errors='coerce')
    df['Open Interest'] = pd.to_numeric(df['Open Interest'], errors='coerce')

    # Drop rows missing key values
    df.dropna(subset=['Strike', 'Open Interest', 'Implied Volatility'], inplace=True)

    return df

# Load data
calls = clean_options_data("calls.csv")
puts = clean_options_data("puts.csv")

# Stop if headers are invalid
if calls is None or puts is None:
    print("Fix your CSV headers and retry.")
    exit()

# Estimate ATM strike and IV
atm_strike = calls.loc[calls['Open Interest'].idxmax(), 'Strike']
atm_iv = calls.loc[calls['Strike'] == atm_strike, 'Implied Volatility'].mean()

# Estimate move
days_to_expiry = 75
expected_move = atm_strike * atm_iv * np.sqrt(days_to_expiry / 365)

print(f"\n🧠 Estimated Stock Price: ${atm_strike:.2f}")
print(f"πŸ“ˆ Expected Β± Move in {days_to_expiry} days: ${expected_move:.2f}")
print(f"πŸ” Price Range: ${atm_strike - expected_move:.2f} to ${atm_strike + expected_move:.2f}")

# Max Pain
def max_pain(calls_df, puts_df):
    strikes = sorted(set(calls_df['Strike']).union(set(puts_df['Strike'])))
    total_pain = []

    for strike in strikes:
        call_pain = ((calls_df['Strike'] - strike).clip(lower=0) * calls_df['Open Interest']).sum()
        put_pain = ((strike - puts_df['Strike']).clip(lower=0) * puts_df['Open Interest']).sum()
        total_pain.append((strike, call_pain + put_pain))

    pain_df = pd.DataFrame(total_pain, columns=['Strike', 'Total Pain'])
    return pain_df.loc[pain_df['Total Pain'].idxmin(), 'Strike']

max_pain_strike = max_pain(calls, puts)
print(f"\n🎯 Max Pain Strike: ${max_pain_strike:.2f}")

Enter fullscreen mode Exit fullscreen mode

Top comments (0)