tomasz dobrowolski

Posted on May 23 • Originally published at flashalpha.com

Inside an Unusual Options Activity Score: Six Components, One Audit Trail

#programming #api #finance #data

The unusual-options-flow market is full of black boxes. Vendors publish "scored" feeds without specifying what the score means: which trades qualify, how the score is computed, what threshold separates a meaningful signal from background noise.

For a developer building a screener, this is unworkable. You cannot calibrate against a number whose derivation is hidden, and you cannot ignore false positives whose drivers you cannot see.

This article is the methodology paper for the FlashAlpha Flow Signals API. The score is the deterministic, weighted average of six normalised components. Every component is returned in the response under score_breakdown. The classification of every trade as a sweep or block, an opening or closing flow, a bullish or bearish bet, follows rules documented in code and reproduced here.

What the feed actually is

GET /v1/flow/signals/{symbol} takes the raw block-sized prints from the OPRA tape, groups same-side prints that arrived within 500ms of each other on the same contract into single execution intents, and produces one scored, classified FlowSignal per group.

The output is the input layer for any system that needs to react to unusual activity: a trading dashboard, an alert ruleset, a backtest universe selector, or an LLM that needs structured options-flow context.

The pipeline is split across pure-static components so every step is independently testable on synthetic tapes. No clock, no HTTP, no global state outside the inputs.

1. FlowDataClient.NotableTradesAsync  - windowed pull from notable-trade ring
2. FlowAnalyticsService.LoadAsync     - greek snapshots + OI simulator state
3. UnusualFlowScorer.Score            - pure classification and scoring
4. UnusualFlowEnricher.Enrich         - chain overlay (greeks, IV vs ATM, etc)
5. Controller                         - tier gate, filters, JSON shaping

Sweep coalescing: 500ms, same side, same contract

The single most important pre-processing step is sweep coalescing. A real institutional execution often crosses multiple venues in rapid succession: one bid lifted on PHLX, another at the same instant on CBOE, a third on AMEX, because that is how exchanges route a multi-leg sweep. On the tape, that single intent looks like three prints.

Without coalescing, the score-it-once-per-print approach would either double-count the intent or under-rate it.

group(t_i, t_{i+1}) iff
    side(t_i) == side(t_{i+1})
    AND (t_{i+1} - t_i) <= 500ms

Any group with two or more prints is classified as a sweep; a singleton block-sized print is a block. The size used for scoring is the sum across the group; the price is the size-weighted average; the timestamp is the last print in the group.

The 500ms window is deliberately conservative. Genuine cross-venue sweeps usually finish in under 100ms; the extra headroom catches venues that route via slower paths without false-coalescing prints that are merely temporally adjacent.

The six scoring components

Every coalesced group is scored along six axes. Each component is normalised to [0,1] before weighting, the components are combined as a weighted average, and the result is scaled to a 0-100 integer.

1. Premium (weight 1.0)

The notional dollar value of the trade (price × size × 100), log-normalised against a $10M cap.

n_premium = clamp( log10(1 + P) / log10(1 + P_cap), 0, 1 )

A $1M trade scores about 0.86, a $100K trade about 0.71, a $10M trade saturates at 1.00. The log normalisation is deliberate: linear scaling would push every $100K-$1M print to noise and only spotlight the largest tape.

2. Size vs Open Interest (weight 1.0)

The ratio of trade size to the contract's settled OI, capped at 1.0.

n_size_vs_oi = clamp( size / max(1, settled_oi), 0, 1 )

This is the unusual-activity classic. A 50,000-lot SPY 0DTE print on a strike with 800,000 OI scores 0.0625; the same 50,000-lot on a single-name back-month strike with 30,000 OI saturates at 1.0. That is intended behaviour: relative significance, not absolute size. For absolute-size ranking, sort by premium instead.

3. Aggressor Strength (weight 0.8)

Where the print landed inside the NBBO, side-aware.

n_aggressor =
    0.4                        if side == mid
    0.5                        if spread <= 0 (crossed/locked)
    (price - bid) / spread     if side == buy
    (ask - price) / spread     if side == sell

A buyer who lifts the offer scores near 1.0; a buyer paying the midpoint scores around 0.5. Mid prints get a flat 0.4 because they cannot be classified as aggressive in either direction with confidence.

4. Sweep Structure (weight 1.0)

n_sweep =
    1.00   coalesced group, >= 2 prints (sweep)
    0.55   singleton block-sized print (block)
    0.20   single print below block size (rare)

Sweeps signal urgency. A standalone block is meaningful but compatible with patient, single-venue execution.

5. Opening Bias (weight 1.2, but capped at 0.43)

Whether the trade opens new positions or closes existing ones, from the OI simulator's signed intraday delta.

n_opening = bias_score * oi_confidence

bias_score =
    1.0   net OI delta > 0  (OpeningBias)
    0.3   net OI delta < 0  (ClosingBias)
    0.5   net OI delta == 0 (Unknown)

oi_confidence = 0.43   # calibrated against next-morning settled OI

This is the most important design choice in the entire scorer.

Even a "perfect" opening signal contributes at most 0.43 to the normalised component, not 1.0. With the default weight of 1.2 out of total 5.6, the opening-bias bucket can contribute at most 100 * 1.2 * 0.43 / 5.6 ≈ 9.2 points to the composite score.

That ceiling is intentional: the simulator's calibrated confidence is 0.43, and a scoring model that gave full credit to opening signals would be overstating what the underlying inference can actually prove. Honest in, honest out.

6. Tenor (weight 0.6)

Days to expiration, normalised so short-dated trades score higher.

n_tenor = clamp( 1 - dte/T_cap, 0, 1 )   # T_cap = 45 default

A 0DTE print scores 1.0, a 22-DTE print scores 0.51, a 45+-DTE print scores 0. LEAPS get zero credit for tenor. Shorter-dated flow has more time-discounted information value.

The composite

contribution_i = round( 100 * w_i * n_i / sum(w_j) )
score = clamp( sum(contributions), 0, 100 )

Default weights (sum 5.6): premium=1.0, size_vs_oi=1.0, aggressor=0.8, sweep=1.0, opening_bias=1.2, tenor=0.6.

The breakdown lives in the response:

{
  "score": 78,
  "score_breakdown": {
    "premium":      22,
    "size_vs_oi":   10,
    "aggressor":    13,
    "sweep":        18,
    "opening_bias":  6,
    "tenor":         9
  }
}

Add the buckets and you get the score, within rounding. Any score in the feed can be reconstructed from the trade + OI context using the formulas above. There are no hidden adjustments.

Intent classification

Once scored, the scorer assigns a directional intent.

intent =
    Neutral   if bias == ClosingBias        # unwind, direction unknown
    Neutral   if side == Mid
    Bullish   if BuyCall  or SellPut
    Bearish   if SellCall or BuyPut

The closing-bias-collapses-to-Neutral rule is load-bearing. A closing trade tells you what is being unwound, but the direction the unwinder originally bet is generally unknown to the tape. Calling a closing trade "Bullish" or "Bearish" would over-claim.

The consumer can still read open_close_bias directly if they want to count closing flow separately.

Conviction labels and the "golden" tag

Score	Conviction
80-100	high
60-79	medium
40-59	low
0-39	minimal

A signal is tagged golden if its score is in the top decile of the current result set and meets the absolute floor of 70. The dual gate stops a quiet session from promoting a mediocre signal.

Greeks enrichment overlay

After scoring, the enricher joins the signal with the settled greek snapshot:

iv, delta, gamma from the snapshot (null if just-listed)
iv_vs_atm: the contract's IV minus the ATM IV on its expiry
moneyness: ITM if |δ| >= 0.65, OTM if <= 0.35, else ATM
estimated_delta_notional = size * 100 * δ * S
hypothetical_gex_impact_if_opening = size * 100 * Γ * S² * 0.01

The hypothetical GEX field is named for a reason. The live chain on the flow surface already folds intraday OI deltas in, so adding the per-signal impact on top would double-count. Use it to size individual trades, not to recompute chain GEX.

Quick example: top bullish sweeps on SPY

import requests

resp = requests.get(
    "https://lab.flashalpha.com/v1/flow/signals/SPY",
    headers={"X-Api-Key": "YOUR_KEY"},
    params={
        "windowMinutes": 240,
        "intent": "bullish",
        "structure": "sweep",
        "minScore": 70,
        "limit": 10,
    },
)
data = resp.json()

for s in data["signals"]:
    print(
        f"{s['ts']}  {s['expiry']} {s['strike']}{s['right']}"
        f"  size={s['size']:>5}  ${s['premium']:>10,.0f}"
        f"  score={s['score']}  ({s['conviction']})"
        f"  {','.join(s['tags'])}"
    )

What it isn't

Three honest scope limits:

Not a full tape replay. The notable ring is capped at 512 prints; this is a windowed scan, not exhaustive history.

Not single-print precision intent. Multi-leg structures (verticals, butterflies, calendars) aren't detected at this layer. A bear-call-spread leg that buys the lower call still reads as a bullish single leg here.

Not a replacement for per-strike GEX. The chain-level live GEX number lives at /v1/flow/gex, computed against effective OI. Use both: signals for intent and significance, the GEX endpoint for chain-level dealer exposure.

Why the score breakdown matters

Every unusual-activity vendor publishes a score. Almost none of them tell you what is inside it. The buyer is forced to trust the number, the implicit weighting, and the (undisclosed) calibration choices the vendor made years ago and may or may not have revisited.

The default weights (1.0 / 1.0 / 0.8 / 1.0 / 1.2 / 0.6) were set by eye after looking at a representative range of SPY, NVDA, and TSLA sessions. There is no claim that these weights are statistically optimal. They are documented defaults you can override per-request. The audit trail in score_breakdown exists partly so customers can build their own re-weighting on top of the normalised components.

The score is the product. The methodology is the warranty.

Full methodology paper with all formulas typeset, FAQ, and worked examples: Flow Signals API on FlashAlpha. Free API key, no card required, on the pricing page.

DEV Community