Supreet Sethi

Posted on Nov 27

Forget Wall Street: Here’s How to Create the Best Stock Screener Yourself

#stocks #cryptocurrency #python #pandas

Stock Screeners for Math People: Build Your Own

A stock screener is just a function. You don't need Wall Street, a fancy terminal, or YouTube. If you know high school math and Python, you can build your own screener that is transparent, tuned to your portfolio, and changeable as your ideas evolve.

1. What Is a Stock Screener, Really?

At its core, a stock screener is a mathematical filter function:

\text{Screener}(\text{stock}) = \begin{cases} \text{TRUE} & \text{if stock passes your rules} \ \text{FALSE} & \text{otherwise} \end{cases}

You have a universe of N stocks, each with data:

\text{Input} = {S_1, S_2, \ldots, S_N}, \quad S_i = {P_i(t), V_i(t), F_i}

Where:

$P_i(t)$ = price of stock i at time t
$V_i(t)$ = trading volume
$F_i$ = fundamental data (EPS, revenue, etc.)

Your screener applies a function:

f(P_i(t), V_i(t), F_i) \rightarrow {\text{TRUE}, \text{FALSE}}

Apply this to 5,000 stocks, keep those returning TRUE, then rank them. That's the entire idea.

2. Why the "Best Screener" Depends on What You Already Own

The biggest mistake beginners make: treating the "best stock screener" as universal.

In reality, it depends on your portfolio vector:

\vec{P} = [p_1, p_2, \ldots, p_n]

Each $p_i$ is a stock with its own return series. When you add a new stock $X$ , what matters is correlation with your existing portfolio.

Correlation Between Return Series

\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}

The interpretation:

$\text{Corr} \approx 1$ → they move together (bad for diversification)
$\text{Corr} \approx 0$ → independent movements (good)
$\text{Corr} \approx -1$ → opposite moves (excellent hedge)

The Real Problem

If you already own 10 tech stocks with mutual correlation ≈ 0.8, adding an 11th tech stock just concentrates risk. It's not diversification—it's doubling down.

Your screener must know what you own and prefer low-correlation stocks.

import numpy as np
import pandas as pd

def portfolio_returns(returns_df, weights):
    """Compute weighted portfolio return series"""
    w = np.array(weights).reshape(-1, 1)
    return (returns_df.values @ w).flatten()

def correlation_with_portfolio(candidate_returns, portfolio_ret_series):
    """Correlation between a candidate and your portfolio"""
    corr_matrix = np.corrcoef(candidate_returns, portfolio_ret_series)
    return corr_matrix[0, 1]

# Decision rule: only buy if correlation < 0.5

3. Core Math Building Blocks

3.1 Returns: Rate of Change

Return over n days is the percentage change:

R_n(t) = \frac{P(t) - P(t-n)}{P(t-n)}

Example:

Price 20 days ago: $100
Price today: $115
20-day return: $\frac{115 - 100}{100} = 0.15 = 15\%$

3.2 Moving Averages: Smoothing Noise

Simple Moving Average over n days:

\text{SMA}n(t) = \frac{1}{n} \sum{i=0}^{n-1} P(t-i)

If price is above its recent SMA, it often signals an uptrend.

3.3 Volume Ratio: Conviction Metric

Volume tells you "how many people agree with the move":

\text{Volume Ratio} = \frac{\text{Recent Average Volume}}{\text{Historical Average Volume}}

If this is $ParseError: KaTeX parse error: Expected 'EOF', got '}' at position 1: }̲ > 1.5$ , volume is picking up—a strong signal.

4. Your Data: CSV + SQLite

Use two simple data sources you control locally:

Daily Prices in CSV

One file per ticker: AAPL.csv

Date,Open,High,Low,Close,Volume
2024-01-15,150.00,152.50,149.80,151.20,2500000
2024-01-16,151.30,153.00,151.00,152.80,2300000
2024-01-17,152.00,153.80,151.50,152.95,2400000

Quarterly Fundamentals in SQLite

import sqlite3

conn = sqlite3.connect("stocks.db")
cursor = conn.cursor()

cursor.execute("""
CREATE TABLE IF NOT EXISTS fundamentals (
    ticker TEXT,
    quarter_date TEXT,
    eps REAL,
    estimated_eps REAL,
    revenue REAL,
    net_income REAL,
    market_cap REAL
)
""")

cursor.execute("""
CREATE TABLE IF NOT EXISTS companies (
    ticker TEXT PRIMARY KEY,
    name TEXT,
    sector TEXT
)
""")

conn.commit()

You now have a local data warehouse you control—no API limits, fully reproducible.

5. Screener #1: Momentum (First Derivative of Price)

Momentum is just rate of change:

\text{Momentum}_n(t) = \frac{P(t) - P(t-n)}{P(t-n)} \cdot 100\%

A simple rule: select stocks with 20-day momentum $\gt 15\%$ .

import pandas as pd
import numpy as np
import os

def load_price_data(csv_dir, ticker):
    """Load CSV into DataFrame"""
    df = pd.read_csv(f"{csv_dir}/{ticker}.csv")
    df["Date"] = pd.to_datetime(df["Date"])
    return df.set_index("Date")

def momentum_20d(close_series, period=20):
    """Calculate n-day momentum percentage"""
    if len(close_series) < period + 1:
        return np.nan
    p_today = close_series.iloc[-1]
    p_past = close_series.iloc[-period - 1]
    return (p_today - p_past) / p_past * 100.0

def momentum_screen(csv_dir, min_mom_pct=15):
    """Screen: momentum > 15%"""
    tickers = [f[:-4].upper() for f in os.listdir(csv_dir) 
               if f.endswith(".csv")]
    results = []

    for t in tickers:
        try:
            df = load_price_data(csv_dir, t)
            mom = momentum_20d(df["Close"])
            if mom >= min_mom_pct:
                results.append({
                    "ticker": t,
                    "momentum_20d_pct": round(mom, 2)
                })
        except Exception as e:
            print(f"Error on {t}: {e}")

    return sorted(results, key=lambda x: x["momentum_20d_pct"], reverse=True)

# Run it
momentum_stocks = momentum_screen("./stock_data")
print(f"Found {len(momentum_stocks)} momentum candidates")

When to use: When you see a trend forming. Momentum works because markets overshoot—once they start moving, they keep moving.

6. Screener #2: Volume Accumulation (Smart Money Signal)

We want to know if price moves are backed by serious trading volume.

The money flow line combines two facts:

Where did price close in today's range?
How much volume was there?

\text{MoneyFlow}(t) = \frac{\text{Close}(t) - \text{Low}(t)}{\text{High}(t) - \text{Low}(t)} \cdot \text{Volume}(t)

If this is consistently positive AND volume is rising, you likely have accumulation.

def volume_ratio(df, recent_days=5, hist_days=20):
    """Volume momentum: recent average / historical average"""
    recent = df["Volume"].tail(recent_days).mean()
    hist = df["Volume"].tail(hist_days).mean()
    return recent / hist if hist > 0 else 0

def money_flow_sum(df, period=14):
    """Sum of money flow over period"""
    high = df["High"]
    low = df["Low"]
    close = df["Close"]
    vol = df["Volume"]

    hl_range = high - low
    position = np.where(
        hl_range != 0,
        (close - low) / hl_range,
        0.5  # Avoid division by zero
    )
    flow = position * vol
    return flow.tail(period).sum()

def accumulation_screen(csv_dir, min_vol_ratio=1.5, min_price=5.0):
    """Screen: rising volume + positive money flow"""
    tickers = [f[:-4].upper() for f in os.listdir(csv_dir) 
               if f.endswith(".csv")]
    results = []

    for t in tickers:
        try:
            df = load_price_data(csv_dir, t)
            if df["Close"].iloc[-1] < min_price:
                continue

            vr = volume_ratio(df)
            mf = money_flow_sum(df)

            if vr >= min_vol_ratio and mf > 0:
                results.append({
                    "ticker": t,
                    "volume_ratio": round(vr, 2),
                    "money_flow": int(mf)
                })
        except Exception as e:
            print(f"Error on {t}: {e}")

    return sorted(results, key=lambda x: x["volume_ratio"], reverse=True)

# Run it
accum_stocks = accumulation_screen("./stock_data")
print(f"Found {len(accum_stocks)} accumulation candidates")

When to use: When you want to catch early moves before the broader market notices. Volume leads price.

7. Screener #3: Earnings Surprise (When Reality Beats Expectations)

Markets react strongly when earnings differ from estimates.

\text{Surprise\%} = \frac{\text{EPS}{\text{actual}} - \text{EPS}{\text{estimate}}}{\text{EPS}_{\text{estimate}}} \cdot 100

Positive surprises often lead to repricing.

import sqlite3
from datetime import datetime

def get_latest_earnings(db, ticker):
    """Fetch most recent earnings from database"""
    cur = db.cursor()
    cur.execute("""
        SELECT quarter_date, eps, estimated_eps
        FROM fundamentals
        WHERE ticker = ?
        ORDER BY quarter_date DESC
        LIMIT 1
    """, (ticker,))

    row = cur.fetchone()
    if not row:
        return None

    qdate, eps, est = row
    surprise_pct = (eps - est) / est * 100 if est else 0
    return {
        "date": qdate,
        "eps": eps,
        "est": est,
        "surprise_pct": surprise_pct
    }

def earnings_surprise_screen(csv_dir, db_path, 
                            min_surprise=10, max_days_old=45):
    """Screen: positive recent earnings surprise"""
    db = sqlite3.connect(db_path)
    tickers = [f[:-4].upper() for f in os.listdir(csv_dir) 
               if f.endswith(".csv")]
    results = []

    for t in tickers:
        try:
            e = get_latest_earnings(db, t)
            if not e:
                continue

            days_old = (datetime.now() - 
                       datetime.fromisoformat(e["date"])).days

            if e["surprise_pct"] >= min_surprise and days_old <= max_days_old:
                results.append({
                    "ticker": t,
                    "surprise_pct": round(e["surprise_pct"], 2),
                    "days_since_earnings": days_old
                })
        except Exception as ex:
            print(f"Error on {t}: {ex}")

    return sorted(results, key=lambda x: x["surprise_pct"], reverse=True)

# Run it
earnings_stocks = earnings_surprise_screen("./stock_data", "stocks.db")
print(f"Found {len(earnings_stocks)} earnings surprise candidates")

When to use: Right after earnings season. Markets often misprice surprises—you're looking for the market catching up.

8. Screener #4: Valuation (Price Per Unit of Business)

Simple valuation metrics avoid overpaying.

Price-to-Sales Ratio:

\text{P/S} = \frac{\text{Market Cap}}{\text{Annual Revenue}}

Price-to-Earnings Ratio:

\text{P/E} = \frac{\text{Market Cap}}{\text{Net Income}}

Lower ratios are cheaper. But cheap can mean "dying business", so combine with growth/momentum.

def get_latest_fundamentals(db, ticker):
    """Fetch latest quarter fundamentals"""
    cur = db.cursor()
    cur.execute("""
        SELECT market_cap, revenue, net_income
        FROM fundamentals
        WHERE ticker = ?
        ORDER BY quarter_date DESC
        LIMIT 1
    """, (ticker,))

    row = cur.fetchone()
    if not row:
        return None
    return {
        "market_cap": row[0],
        "revenue": row[1],
        "net_income": row[2]
    }

def valuation_screen(csv_dir, db_path, max_ps=2.0, 
                     max_pe=15.0, min_price=5.0):
    """Screen: low P/S and P/E ratios"""
    db = sqlite3.connect(db_path)
    tickers = [f[:-4].upper() for f in os.listdir(csv_dir) 
               if f.endswith(".csv")]
    results = []

    for t in tickers:
        try:
            f = get_latest_fundamentals(db, t)
            if not f:
                continue

            mc = f["market_cap"]
            rev = f["revenue"]
            ni = f["net_income"]

            ps = mc / rev if rev > 0 else float("inf")
            pe = mc / ni if ni > 0 else float("inf")

            if ps > max_ps or pe > max_pe:
                continue

            df = load_price_data(csv_dir, t)
            price = df["Close"].iloc[-1]
            if price < min_price:
                continue

            results.append({
                "ticker": t,
                "ps": round(ps, 2),
                "pe": round(pe, 2),
                "price": round(price, 2)
            })
        except Exception as ex:
            print(f"Error on {t}: {ex}")

    return sorted(results, key=lambda x: x["ps"])

# Run it
value_stocks = valuation_screen("./stock_data", "stocks.db")
print(f"Found {len(value_stocks)} value candidates")

When to use: When you want a "margin of safety"—cheap stocks provide cushion if things go wrong.

9. Screener #5: Combined Signal (Weighted Score)

Now you have four independent signals. Combine them with a weighted linear combination:

\text{Score} = w_1 s_1 + w_2 s_2 + w_3 s_3 + w_4 s_4

Where:

$s_i$ = normalized score from signal i (scaled to 0–1 range)
$w_i$ = weight for signal i (must sum to 1)

Example: equal weights $w_i = 0.25$ for each signal.

def normalize(value, min_val, max_val, invert=False):
    """Normalize value to [0, 1] range"""
    if max_val == min_val:
        return 0.5
    x = (value - min_val) / (max_val - min_val)
    x = max(0, min(1, x))  # Clamp to [0, 1]
    return 1 - x if invert else x

def combine_signals(momentum_dict, volume_dict, 
                   earnings_dict, valuation_dict):
    """Combine four signals into one composite score"""
    all_tickers = (set(momentum_dict) | set(volume_dict) | 
                   set(earnings_dict) | set(valuation_dict))
    scores = {}

    for t in all_tickers:
        score = 0

        # Momentum: 0 to 50 percent (25% weight)
        if t in momentum_dict:
            m = momentum_dict[t]
            score += normalize(m, 0, 50) * 25

        # Volume ratio: 1 to 3 (25% weight)
        if t in volume_dict:
            v = volume_dict[t]
            score += normalize(v, 1, 3) * 25

        # Earnings surprise: 0 to 50 percent (25% weight)
        if t in earnings_dict:
            e = earnings_dict[t]
            score += normalize(e, 0, 50) * 25

        # Valuation: lower P/S better, inverted (25% weight)
        if t in valuation_dict:
            ps = valuation_dict[t]
            score += normalize(ps, 0, 5, invert=True) * 25

        scores[t] = score

    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return [{"ticker": t, "score": round(s, 1)} for t, s in ranked]

# Run all screeners and combine
momentum_results = momentum_screen("./stock_data")
volume_results = accumulation_screen("./stock_data")
earnings_results = earnings_surprise_screen("./stock_data", "stocks.db")
valuation_results = valuation_screen("./stock_data", "stocks.db")

# Convert to dictionaries keyed by ticker
momentum_dict = {r["ticker"]: r["momentum_20d_pct"] for r in momentum_results}
volume_dict = {r["ticker"]: r["volume_ratio"] for r in volume_results}
earnings_dict = {r["ticker"]: r["surprise_pct"] for r in earnings_results}
valuation_dict = {r["ticker"]: r["ps"] for r in valuation_results}

# Get composite scores
combined = combine_signals(momentum_dict, volume_dict, 
                          earnings_dict, valuation_dict)

print("\n🏆 TOP 10 STOCKS (COMPOSITE SCORE):")
for i, result in enumerate(combined[:10], 1):
    print(f"{i:2}. {result['ticker']:6} — Score: {result['score']:>6.1f}/100")

You now have your own best stock screener that:

Uses clear math
Uses data you control
Can be tuned in a few lines of code

10. Portfolio Context: Which Screener Should You Emphasize?

You don't have to use all signals equally. Your weights depend on what you own.

If you own bonds + gold (defensive)

Focus on:

Momentum (high weight)
Volume accumulation (high weight)

Logic: You want high-growth, high-beta assets to offset your defensive base.

If you already own a lot of tech

Focus on:

Valuation (high weight)
Correlation filter (not a weight, but mandatory)

Logic: Reduce concentration risk rather than stacking more tech. Screen for uncorrelated sectors.

If you're just starting

Use equal weights (25% each).

Then gradually shift weights as you learn what matches your trading style.

11. What Your Broker Won't Tell You

Most broker "stock screener" tools:

❌ Use fixed rules not tuned to your portfolio
❌ Don't show you the actual math
❌ Encourage overtrading with flashy filters
❌ Change rules without telling you

You can do better:

✅ Write the math yourself
✅ Implement it in Python
✅ Run it weekly on your own data
✅ Check correlation before buying
✅ Keep a log of rule changes and results
✅ Know exactly what you own

12. Quick Checklist: Your Own Stock Screener

[ ] You understand inputs: price, volume, fundamentals
[ ] You compute returns with $R_n(t) = \frac{P(t) - P(t-n)}{P(t-n)}$
[ ] You use moving averages to smooth noise
[ ] You confirm moves with volume (not price alone)
[ ] You check earnings surprise (not just headlines)
[ ] You use valuation ratios to avoid overpaying
[ ] You combine signals into a weighted score
[ ] You check correlation with your existing portfolio
[ ] You only change rules slowly and record impact
[ ] You keep your data local (CSV + SQLite)

Next Steps

Set up your data pipeline: Download historical CSV files for 50–100 stocks
Populate SQLite: Add quarterly fundamentals from public sources
Run each screener independently: Build intuition for what each detects
Combine with your weights: Start with equal weights, then tune
Backtest: Log your picks and track win rates
Iterate: Adjust weights monthly based on what works

Good luck.

DEV Community