DEV Community: TickDistill

Sigma-Normalization: Why Order-Flow Signals Should Be Measured in Standard Deviations, Not Raw Numbers

TickDistill — Thu, 25 Jun 2026 15:01:13 +0000

Sigma-Normalization: Why Order-Flow Signals Should Be Measured in Standard Deviations, Not Raw Numbers

By TickDistill — order-flow microstructure signals. Educational content, not financial advice.

The short answer

A raw number like "a 200-contract order" or "$3M of buying" means nothing on its own. The useful question is "how unusual is this, here, right now?" Sigma-normalization answers exactly that: it re-expresses a measurement as how many standard deviations it sits from its own recent distribution. The output is a z-score — a measure of rarity, not of magnitude.

Why raw thresholds break

Fixed, absolute thresholds fail in two directions at once:

Across instruments. 200 contracts may be enormous on one market and trivial on another. A dollar figure that is "big" for one asset is noise for a more liquid one. A single hard-coded number cannot be right everywhere.
Across regimes. The same market is calm in one week and violent the next. A print that is extreme in a quiet regime is ordinary during high volatility. A threshold fixed last month is wrong this month.

Sigma-normalization solves both: by dividing by the local standard deviation, "big" always means "big relative to what this market has been doing lately." The signal becomes comparable across assets and adaptive across regimes — without you re-tuning anything.

What "normalized" actually buys you

Portability. The same signal logic runs on BTC, ETH, SOL — and later on regulated futures like the E-mini S&P 500 — by swapping a per-market profile, not by rewriting the signal. One definition, many markets.
Rarity as the unit. When a signal says "2.5σ," it is telling you this event is genuinely uncommon for this market. An 80/20 directional split is rare (high sigma); a 60/40 split is normal (low sigma) and carries little information — even though both look "directional" on the surface.
Stable interpretation. A "2σ" event means the same kind of thing whether the market is calm or wild, because the yardstick moves with the market.

The non-negotiable part: point-in-time correctness

A normalization is only honest if its yardstick uses only the past. If the standard deviation is computed using data from after the signal moment — even by accident — the backtest looks brilliant and the live signal disappoints. This is look-ahead bias, and it is the most common way order-flow research fools itself.

TickDistill computes every baseline causally (rolling, past-only) and excludes recurring mechanical windows that would otherwise distort the statistics — for example an index's market-on-close auction, or a perpetual's funding settlements, where huge volume appears for structural reasons rather than informational ones. We treat point-in-time correctness as a hard rule, because a backtest you cannot trust is worse than no backtest at all.

How this shows up in the product

Every TickDistill signal expresses its thresholds in sigma units, not raw numbers. That is why the same package behaves sensibly across our supported assets, why "rare" means rare, and why our backtests are built to be reproducible point-in-time. When you later tune a signal's sensitivity (see signal knobs), you are moving a threshold in sigma — a number that keeps its meaning as markets change.

TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. Backtests are illustrative and not a promise of future results.

What Is Point-in-Time Correctness? Why No-Look-Ahead Makes or Breaks a Backtest

TickDistill — Thu, 25 Jun 2026 14:55:28 +0000

What Is Point-in-Time Correctness? Why No-Look-Ahead Makes or Breaks a Backtest

By TickDistill — order-flow microstructure signals. Educational content, not financial advice.

The short answer

Point-in-time correctness is the guarantee that every computation at time t uses only data from strictly before t. Violating this constraint — accidentally or structurally — is called look-ahead bias, and it is the most common way order-flow research produces backtest results that cannot be reproduced in live trading. TickDistill treats point-in-time correctness as a hard engineering invariant: every baseline, every normalization, every mask is causal by construction, and every backtest result can be independently reproduced from the same inputs.

What does "point-in-time correct" mean, exactly?

Point-in-time correctness means that the value of any signal emitted at timestamp t is a deterministic function of data with timestamps t' < t only. No observation from t' ≥ t enters the computation — not the current bucket, not a future bucket, not in the normalization denominator, not in the exclusion mask calibration.

The strict inequality matters. Including the current observation (t' ≤ t instead of t' < t) is still a form of contamination: the baseline that normalizes a measurement must not include that same measurement, or the z-score becomes self-referential.

Why look-ahead bias is so easy to introduce by accident

Look-ahead bias does not require deliberate cheating. It emerges from common implementation shortcuts:

Source	How it happens
In-sample normalization	Computing the mean/std over the entire history, then using it to normalize each historical point
Rolling window off-by-one	A `pandas.rolling().mean()` default that includes the current row in the window
Global volatility estimate	Using the full-period σ as the denominator for a z-score computed at each past point
Classifier training	Training a trade-side classifier on the same period you backtest the signal
Mask calibration	Identifying "noisy" windows after the fact and masking them retroactively

Each of these makes a past computation depend on future information. The backtest looks cleaner than it is; live performance does not benefit from knowledge of the future.

The causal baseline: `t' < t` strictly

A causal baseline is a rolling statistic — mean, standard deviation, or exponentially weighted equivalent — computed at each point using only the observations that were available at that point in history.

The public z-score formula is:

z_t = ( x_t − μ_t ) / σ_t

where μ_t and σ_t are estimated from { x_{t'} : t' < t } exclusively. This is standard practice for normalizing order-flow quantities against a causal baseline (Easley, López de Prado, O'Hara 2012).

Two choices of baseline are common in practice:

Baseline type	Formula	Property
Rolling window of N observations	`μ_t = (1/N) Σ_{i=t-N}^{t-1} x_i`	Equal weight, sharp cutoff
Exponentially weighted (EWM)	`μ_t = (1−λ) Σ_{k=0}^{∞} λ^k x_{t-1-k}`	Smooth decay, infinite memory

The decay parameter λ corresponds to a half-life h via λ = exp(−ln2/h). A longer half-life makes the baseline more stable across regime changes; a shorter half-life makes it more adaptive. The calibration of this parameter is proprietary — what matters for correctness is that whichever estimator is used, it uses t' < t only. TickDistill uses a causal EWM baseline, and the current observation never enters the estimate that normalizes it.

Mechanical windows: why some events must be excluded from the baseline

Even a perfectly causal baseline can be distorted by recurring mechanical events — moments when volume or imbalance is large for structural reasons rather than informational ones.

A clear example is the perpetual futures funding settlement at 00:00, 08:00, and 16:00 UTC (public exchange schedule, Binance and most major venues). At these moments, a funding payment causes predictable positioning activity that is unrelated to informed order flow. Including funding spikes in the baseline causes the baseline σ to inflate, which then suppresses the z-score of genuine order-flow events in surrounding windows.

The solution is an exclusion mask: data within a mechanical window is excluded from updating the baseline. The mask is applied causally — it defines which observations are allowed to enter the rolling statistic. Observations inside the mask are not deleted; the signal may still be computed over them, but the baseline parameters are not updated from them.

μ_t = EWM over { x_{t'} : t' < t  AND  t' ∉ mask }
σ_t = EWM-std over the same filtered set

Which windows to mask, and at what granularity, is a calibration decision that depends on the instrument, the signal, and the empirical effect of the mechanical event on the signal's distribution. The general principle — exclude mechanical events from the normalization baseline — is textbook practice; the specific calendar is proprietary.

Warm-up periods: when a causal baseline is not yet reliable

A rolling or exponentially weighted estimator requires a minimum number of observations before its estimates are stable. Emitting z-scores before the warm-up completes produces values with high estimation error, which corrupt any downstream comparison.

TickDistill enforces two distinct warm-up criteria before emitting any signal value:

Signal window warm-up. A signal that is itself a rolling statistic (e.g., VPIN, a moving imbalance) requires its own window to be filled before it produces a meaningful value.
Baseline warm-up. The causal baseline (μ_t, σ_t) requires a sufficient number of non-masked observations before its estimates stabilize. For an EWM baseline with half-life h, stability is reached after approximately 5h observations — the point at which the weight of the initialization drops below roughly 3%.

No signal point is emitted until both criteria are satisfied. A missing warm-up is equivalent to a form of look-ahead: the estimator behaves as if it has more historical information than it does.

Anti-look-ahead: the test that verifies the guarantee (Test 5)

The claim of point-in-time correctness is verifiable. The test is direct: compute signal values over a stream, then modify trades at timestamps t' > t, and confirm that the signal value at t is identical.

Formally, for any t and any perturbation of { x_{t'} : t' > t }:

signal(t | history up to t)  =  signal(t | history up to t, perturbed future)

If this equality fails, the computation has a look-ahead dependency. This test is mandatory in TickDistill's test suite and covers every path: the signal window, the baseline estimator, the mask exclusion, and the BVC price-change estimator σ_dP (which uses its own causal window over past price differences between sub-bars, never the full sample).

Reproducibility: why point-in-time correctness enables version-pinned backtests

Point-in-time correctness is a prerequisite for reproducibility. A backtest result from a point-in-time-correct pipeline is a deterministic function of four inputs: (signal, params, range, version) — because each signal is itself a pure parametric function f(primitive, params). Given the same four inputs, the same output must emerge, regardless of when the query runs.

This enables two capabilities:

Permalink/content-hash. Every backtest result can be identified by a hash of its inputs. The result is shareable and reproducible indefinitely.
Version pinning. When a signal formula is updated (v1 → v2), backtest queries pinned to v1 continue to reproduce the v1 result exactly. Code and data definitions are frozen together.

Neither capability is possible if the computation is contaminated by look-ahead, because future data would make the output depend on when the query runs, not only on the declared inputs. See also What makes a backtest reproducible? Permalinks and version pinning.

How this connects to sigma-normalization and signal quality

Sigma-normalization — expressing a signal in units of standard deviations from its own rolling baseline — is only honest if the baseline is causal. An in-sample standard deviation is not a yardstick; it is a measurement taken with a ruler that was calibrated using the answer.

The practical consequence is that live signals and historically backfilled signals use the same code path: the causal baseline estimator, the same mask, the same warm-up logic. There is no separate "backtest mode" that uses full-sample statistics. The backtest is the same computation run over historical data. See Why Order-Flow Signals Should Be Measured in Standard Deviations.

How the pipeline enforces these guarantees

Three architectural properties enforce point-in-time correctness end-to-end:

Single-pass streaming. Each day of data is processed in order, one observation at a time. The state at time t is built from the stream up to t; no random access to future records is possible. See Single-pass streaming ETL and discard.
Immutable daily partitions. Processed outputs are stored as immutable Parquet partitions. Reprocessing a day overwrites its slice cleanly and produces the identical result (idempotence). This is verified by the QA gate. See How We Validate Market Data Before It Becomes a Signal.
Causal baseline module. The baseline estimator is a shared module used by every signal processor. It carries its own state forward per stream and enforces the t' < t constraint at the interface level, so individual signal implementations cannot accidentally access current or future baseline values.

FAQ

What is look-ahead bias, in one sentence?
Look-ahead bias is the use of information from time t' ≥ t when computing a signal value for time t, causing backtest results to reflect knowledge the strategy could not have had.

Why does an in-sample standard deviation cause look-ahead bias?
An in-sample standard deviation is computed over the entire historical period. Using it to normalize a point in the middle of that period means the denominator includes observations that occurred after that point — information the model would not have had in real time.

What is an exclusion mask and why does it not create look-ahead bias?
An exclusion mask is a set of timestamp intervals whose observations are not allowed to update the rolling baseline. The mask must itself be defined causally — based on a public, fixed event schedule (like exchange funding times), not identified from the data after the fact. A mask derived by examining data is a form of look-ahead; a mask derived from a published schedule is not.

Does warm-up affect live trading or only backtests?
Both. In a live deployment, a signal cannot emit values until its baseline has accumulated the required number of non-masked observations. In a historical backtest, the same warm-up logic applies: signal points are absent from the first segment of history until both the signal window and the baseline window are filled.

How can I verify that a signal is point-in-time correct?
The direct test: compute signal values on a stream, modify observations after a target time t, and confirm the signal value at t is unchanged. Any dependency on future data will cause the value to change. This test must cover the signal formula, the baseline estimator, the mask, and any classifier sub-component that has its own rolling estimate.

TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. Backtests are illustrative and not a promise of future results.

What Is Cumulative Volume Delta (CVD) and How Do You Read It?

TickDistill — Thu, 25 Jun 2026 14:55:23 +0000

What Is Cumulative Volume Delta (CVD) and How Do You Read It?

By TickDistill — order-flow microstructure signals. Educational content, not financial advice.

The short answer

Cumulative Volume Delta (CVD) is the running sum of signed taker volume — buy-aggressor volume minus sell-aggressor volume — accumulated over time. CVD tells you whether buyers or sellers have been the aggressive side of the tape across a session, and by how much. The key diagnostic use is comparing CVD's direction with price's direction: when they diverge, the market is revealing a mismatch between aggression and outcome that experienced traders call an absorption tell.

What is signed taker volume and why does it matter?

Signed taker volume is the volume of each trade tagged with the direction of the aggressor — the participant who crossed the spread and consumed resting liquidity. A buy-aggressor trade receives a positive sign; a sell-aggressor trade receives a negative sign.

The sign comes from the exchange's own tape. On Binance, for example, the aggTrades feed records the isBuyerMaker flag per trade: if isBuyerMaker is false, the buyer was the taker (aggressive side) and the trade signs positive; if isBuyerMaker is true, the seller was the taker and the trade signs negative.

Signed volume is the input that separates order-flow analysis from pure price analysis. Price tells you where the market ended up. Signed volume tells you who drove it there — and at what cost in aggression.

What is the CVD formula?

CVD is the running cumulative sum of signed taker volume. For a sequence of trades i = 1, 2, …, t, each with size v_i and aggressor sign s_i ∈ {+1, −1}:

CVD(t) = Σ_{i=1}^{t}  s_i · v_i

where s_i = +1 for buy-aggressor trades and s_i = −1 for sell-aggressor trades.

This is public, textbook math. The formula is standard across microstructure practitioners and appears in order-flow literature under various names — "delta," "cumulative delta," or "signed flow" — all referring to the same accumulation of directional taker pressure.

Over a fixed session window [0, T], CVD equals the total buy-aggressor volume minus the total sell-aggressor volume:

CVD(T) = V_buy_aggressor − V_sell_aggressor

A positive CVD means buyers have been more aggressive than sellers over the window. A negative CVD means sellers have been more aggressive. CVD of zero means aggression has been balanced, regardless of where price moved.

How do you reset or window CVD?

CVD is defined relative to a starting point. Practitioners reset it at:

Session open — accumulates intraday aggression from the first print.
Fixed time windows — e.g., rolling 1-hour or 4-hour CVD to capture regime changes within a session.
Candle boundaries — per-bar delta (the signed volume of a single candle) is the incremental building block; summing across candles reconstructs session CVD.

The choice of reset window is a calibration decision. A short window is more reactive but noisier; a long window integrates more signal but can obscure intraday shifts. TickDistill's computation applies a causal baseline — using only past data — so that the z-score of CVD at any point reflects its rarity relative to its own recent history, not a hard-coded threshold. See Sigma-Normalization: Why Order-Flow Signals Should Be Measured in Standard Deviations, Not Raw Numbers for the method.

What does CVD-vs-price divergence mean?

CVD-vs-price divergence is the condition where price and CVD move in opposite directions over the same window. It is the primary diagnostic use of CVD in order-flow analysis.

Bearish divergence — price makes a higher high but CVD makes a lower high (or fails to confirm): buy-side aggression is weakening even as price moves up. Someone is selling passively into the aggressive buying.

Bullish divergence — price makes a lower low but CVD makes a higher low (or flattens): sell-side aggression is weakening even as price moves down. Someone is buying passively into the aggressive selling.

Divergence is a necessary observation, not a trading signal by itself. The textbook interpretation is that a large passive order — a "wall" of resting liquidity — is absorbing the aggressive flow without yielding price movement proportional to the pressure applied.

CVD direction	Price direction	Reading
Rising	Rising	Confirmed upside aggression
Falling	Falling	Confirmed downside aggression
Flat or falling	Rising	Passive selling absorbing buys (bearish divergence)
Flat or rising	Falling	Passive buying absorbing sells (bullish divergence)
Rising sharply	Price contained	Potential absorption: aggression without price progress

What is the aggressor side and why is it the right input?

The aggressor side is the party who initiated the trade by submitting a market order or marketable limit order that crossed the spread. The aggressor consumes resting liquidity; the passive side provides it.

Aggressor identification is the correct input for CVD for two reasons. First, the aggressor's decision is unconditional — they chose to trade at the prevailing price, revealing genuine directional intent. Second, the passive side's resting order has no direction information encoded in the act of being filled; a limit order sitting at the bid reveals nothing about the limit-order poster's view of the next move.

Without aggressor-side tagging, signed volume is approximation. Methods like the Lee-Ready rule (Lee and Ready, 1991, Journal of Finance) classify trades by comparing the transaction price to the prevailing bid-ask midpoint — buys above midpoint, sells below, with a tick test for trades at the midpoint — and are used when exchange tape does not include the flag. Centralized exchanges publishing the raw aggTrades feed (as Binance does) provide the clean input directly, making tick-classification algorithms unnecessary for those venues.

For decentralized markets and OTC venues where no central tape exists, CVD cannot be computed in its clean form. CVD is a centralized-exchange signal.

How does TickDistill compute CVD?

TickDistill sources aggressor-tagged trades from the Binance Vision aggTrades historical dumps (the isBuyerMaker field per trade), which record the aggressor side directly from the exchange matching engine. This is the L1 data layer — the same layer that powers trade imbalance and big-order detection (see What Is Trade Imbalance in Order Flow?).

The raw CVD accumulation is public math. TickDistill's value-add is in expressing CVD in causal sigma units — recomputing a rolling z-score of the running delta so that "high CVD" means high relative to what this market has been doing lately, not high in absolute terms. A $100M positive CVD on BTC is very different information in a low-volume regime than in a high-volume regime. The z-score makes both comparable and the rarity reading stable across market conditions.

The calibration of the baseline window is proprietary. What matters conceptually is that every baseline uses only past data (no look-ahead) and excludes mechanical windows — such as perpetual funding settlements at 00:00, 08:00, and 16:00 UTC — where volume spikes for structural reasons rather than informational ones.

How is CVD related to trade imbalance?

Trade imbalance is the per-window ratio of buy-aggressor volume to total volume, expressed as a fraction or percentage. CVD is the running sum of the signed delta. They measure the same underlying phenomenon — directional taker pressure — at different time horizons and in different units.

	CVD	Trade imbalance
Unit	Signed volume (contracts / USD notional)	Fraction ∈ [−1, +1] or percentage
Time scope	Accumulates across a session or window	Per fixed window (e.g., 30 s, 5 min)
Primary use	Trend of aggressor pressure over session	Snapshot of directional pressure in a window
Divergence read	vs price over the same session	vs recent baseline

See What Is Trade Imbalance in Order Flow? for trade imbalance in detail.

What are the limits of CVD?

CVD cannot identify individual actors. The Binance aggTrades feed aggregates trades that execute within a short time window (100ms) at the same price and on the same aggressor side into a single record — which is not necessarily a single order. It cannot prove that multiple aggressive prints on the same side are from one participant. High positive CVD means many buy-aggressor fills occurred; it does not prove a single large buyer.

CVD does not capture passive pressure. A market absorbed by large resting orders shows up as price containment in the face of CVD pressure — that is the divergence read — but CVD itself only accumulates the aggressive side. The passive actor's size is inferred, not measured, without a full order book (L2 data).

CVD is directional, not predictive. CVD measures the current state of signed aggressor accumulation. It does not predict future price direction. A strongly positive CVD means buyers have been aggressive; it says nothing about whether that aggression will continue or whether it has already been fully absorbed. See What Is Order Flow Absorption? for how absorption detection builds on this observation.

CVD resets matter. Two traders viewing "CVD" may disagree because they use different reset windows. Always specify the window when communicating CVD readings.

FAQ

Q: Is CVD the same as OBV (On-Balance Volume)?
No. OBV (Granville, 1963) adds the full candle volume if price closes up and subtracts it if price closes down — it uses price direction to sign the volume. CVD uses the exchange tape's aggressor flag to sign each individual trade. CVD is more granular and more accurate about who initiated each trade; OBV is an approximation available wherever OHLCV data exists.

Q: Can CVD be computed for spot forex or equities?
CVD in its precise form requires a centralized tape that records the aggressor side. Spot forex has no single central tape, so clean CVD is not available there. Equity markets publish signed trade data via their own tape (e.g., NYSE TAQ) and Lee-Ready classification is widely applied where the flag is absent. TickDistill currently covers centralized crypto venues where the aggressor flag is available natively.

Q: What does it mean when CVD is flat but price moves sharply?
Flat CVD with sharp price movement means aggressor pressure was balanced, but price moved anyway. This often reflects a large passive order being pulled (removed from the book) rather than absorbed — the market found no resting liquidity at a level and price gapped through it. It is a different condition from absorption and can signal fragility rather than a strong passive actor.

Q: Does a large positive CVD mean the market will go higher?
No. CVD is a measure of past aggressor accumulation, not a forecast. Large positive CVD means buyers have been aggressive over the measurement window. Whether that aggression has already been absorbed, whether it is exhausted, or whether it will continue is a separate question requiring additional context — including price behavior, absorption signals, and regime state.

Q: Where does CVD fit in a broader order-flow stack?
CVD is a foundational L1 metric — it requires only the trade tape with aggressor tagging, no order book. It sits above pure OHLCV (which has no aggressor information) and below L2 signals that use full book depth. It feeds divergence reads, absorption detection, and trade imbalance analysis. See What Is Order Flow Microstructure? for the full hierarchy.

TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. Backtests are illustrative and not a promise of future results.

What Is Order-Flow Microstructure? A Plain-English Guide to Reading the Tape

TickDistill — Thu, 25 Jun 2026 14:18:37 +0000

What Is Order-Flow Microstructure? A Plain-English Guide to Reading the Tape

By TickDistill — order-flow microstructure signals. Educational content, not financial advice.

The short answer

Order-flow microstructure is the study of who initiates each trade — the buyer or the seller — and what the accumulated pattern of those initiations reveals about supply, demand, and near-term price pressure. Every trade has an aggressor: the side that crossed the spread and lifted or hit the resting quote. Microstructure measures that aggressor-side imbalance, its size, its clustering, and its context, rather than price or volume alone.

What is the tape, and why does it matter?

The tape is the real-time stream of every executed trade: timestamp, price, size, and — critically — which side was the aggressor. On centralized venues such as CME Globex or Binance, each matched trade carries a flag identifying whether the buyer or the seller was the market-order initiator. That flag is the foundation of all order-flow analysis.

Without the aggressor field, a 1,000-contract trade is ambiguous: it could be a buyer lifting offers (demand), a seller hitting bids (supply), or a mix. With the aggressor field, the tape becomes directional: you can separate buying pressure from selling pressure in every window you choose to measure.

What is the aggressor side, and how is it identified?

The aggressor side is the counterparty that submitted a market order (or a marketable limit order) and consumed resting liquidity. The passive side is the limit-order resting in the book. Every trade has exactly one aggressor and one passive participant.

On Binance aggTrades, the aggressor field is the isBuyerMaker boolean: when isBuyerMaker = false, the buyer is the aggressor (lifted offers); when isBuyerMaker = true, the seller is the aggressor (hit bids). This is the L1 layer: trade data with aggressor identification, available free from Binance Vision historical dumps.

Note: aggTrades on a single venue cannot prove a single actor placed a given volume. A large buy-side print may be one institutional order or many retail orders arriving simultaneously. L1 measures the directional pressure, not the identity of the participant.

What is signed order flow, and what does it measure?

Signed order flow (also called trade imbalance) is the net difference between buy-aggressed and sell-aggressed volume (or trade count) over a fixed time window. It is positive when buyers are more aggressive than sellers, negative when the reverse holds.

signed_flow_t = sum(buy_volume_i) - sum(sell_volume_i)   for trades i in [t-w, t]

where w is the aggregation window. This is a trade-level quantity in the Kyle (1985) tradition — it is computed entirely from the executed tape (the L1 layer), and it is what most practitioners loosely call "order-flow imbalance." It is a real, useful signal, but it is not the same object as the order-book OFI of Cont, Kukanov, and Stoikov.

A distinction that matters. True order-flow imbalance (OFI) in the sense of Cont, Kukanov, and Stoikov (2014) — extended to generalized and cross-asset OFI by Cont, Cucuringu, and Zhang (2023) — is defined on order-book events at the best bid and ask: queue additions, size changes, and cancellations that move the top of book, not just the executions that cross the spread. CKS report a high contemporaneous R² for that book-event OFI regressed on mid-price changes (their cross-sectional average across 50 NYSE stocks is around 65%), and find the intraday price impact approximately linear in OFI. That headline result belongs to the book-event definition, not to the trade-signed-volume formula above. Because it reads the resting book, true OFI requires Level-2 (L2) order-book data, which is why TickDistill places it at the L4 book layer (paid), not the free trade tier. For the precise three-case construction and the depth-normalization, see What Is Order-Flow Imbalance (OFI), and Why Does It Need the Order Book?.

TickDistill computes a causal z-score of signed order flow — expressed in standard deviations from its own rolling distribution — and emits a rarity reading. The exact window and normalization parameters are calibrated per market and are proprietary; why that calibration matters is explained in Sigma-Normalization.

What is Cumulative Volume Delta (CVD), and how does it relate to OFI?

Cumulative Volume Delta (CVD) is the running sum of signed volume — buy-aggressed volume minus sell-aggressed volume — accumulated without resetting over a session or period. Where OFI is a windowed snapshot, CVD is a cumulative ledger.

CVD_t = CVD_{t-1} + (buy_volume_t - sell_volume_t)

CVD is widely cited in technical and market-microstructure communities as a divergence indicator: when price makes a new high but CVD does not, the buying pressure underpinning that rally is weakening relative to price. This divergence is a standard observation in practitioner microstructure literature.

TickDistill provides CVD as part of its free signal tier. For a detailed treatment, see What Is Cumulative Volume Delta (CVD)?.

What is VPIN, and what does it measure?

VPIN (Volume-synchronized Probability of Informed Trading) is a measure of order-flow toxicity, introduced by Easley, López de Prado, and O'Hara (2012). It estimates the probability that the counterparty to a trade is an informed trader, using volume time rather than clock time.

The public formula slices traded volume into equal-size buckets, then computes the proportion of each bucket that is buy-initiated versus sell-initiated, averaged over a rolling support window of N buckets:

VPIN = (1/N) * sum_{i=1..N} ( |buy_volume_i - sell_volume_i| / V )

where each bucket holds a fixed volume V and N is the number of buckets in the support window. In their original construction, Easley, López de Prado, and O'Hara (2012) set the bucket size to one-fiftieth of average daily volume — V = (daily volume) / 50 — and compute VPIN over a rolling support window of N = 50 buckets. They document that VPIN was elevated and rising in the hours leading up to the 2010 Flash Crash.

VPIN is not directional: a high VPIN reading signals that informed order flow is elevated — that the market is experiencing toxicity — not which direction price will move. It is a jump-risk input, not a price forecast.

TickDistill computes a causal z-score of VPIN rather than applying the folklore 0.7 cutoff from practitioner literature. The threshold that turns a VPIN level into a regime state is proprietary; the calibration exists because different markets and regimes have different baseline toxicity distributions.

What is price impact, and what formulas describe it?

Price impact is the change in mid-price caused by a trade of a given size. Kyle (1985) introduced the linear price impact model:

delta_p = lambda * (buy_volume - sell_volume)

where lambda (Kyle's lambda) is the price impact coefficient, estimated from the regression of mid-price changes on signed order flow. A higher lambda means the market is less liquid: each unit of aggressive order flow moves the price more.

The square-root market impact law (Bouchaud, Gefen, Potters, and Wyart 2004; and the empirical surveys in Bouchaud et al. 2018) describes average permanent impact as growing sub-linearly with order size:

impact ~ sigma * sqrt(Q / V_daily)

where Q is order size, V_daily is average daily volume, and sigma is volatility. The prefactor is instrument-dependent and empirically estimated — TickDistill does not publish the prefactor for any specific market. The key insight is that impact is concave in size, which is why large orders are typically split into child orders.

What are the data layers, and what can each layer measure?

Layer	Event type	Data source	What it enables
L0	OHLCV candles (`CandleEvent`)	Exchange klines (free, universal)	Volatility, momentum, VWAP, range breakouts — the universal baseline
L1	Trades with aggressor side (`TradeEvent`)	Binance aggTrades, CME tick (free/paid)	CVD, signed order flow, VPIN, big-order detection, trade imbalance, cross-venue flow
L2 derivatives	Funding, open interest, liquidations (`MetricEvent`)	Coinalyze, exchange REST (free for crypto)	Funding regime, OI shift/divergence, liquidation pressure, squeeze composites
L4 order book	Limit-order book snapshots and updates (`BookEvent`)	Tardis.dev incremental L2 (paid)	True OFI at the queue level, absorption, liquidity maps, iceberg detection

L0 signals are universal across all markets. L1 signals require a centralized tape with aggressor identification, which is available on crypto centralized exchanges and CME futures, but not on fragmented forex spot markets where there is no single tape. L4 signals require paid historical book data or self-captured streaming book data.

TickDistill's architecture converts every raw exchange format into a normalized internal event schema (TradeEvent, BookEvent, etc.) so that signal processors run the same logic across BTC, ETH, SOL, and later ES/NQ futures — only the per-market calibration profile changes.

What is big-order detection, and why does it matter?

Big-order detection identifies individual trade prints or clusters of prints that exceed a statistically rare size threshold for their market and regime. A single large aggressive print may indicate institutional urgency; a cluster of large prints on one side without interleaving from the opposite side suggests a sweep — a coordinated clearing of resting liquidity.

The size threshold for "big" is meaningless as an absolute dollar figure. A $10M notional print is routine for BTC and enormous for a small-cap futures contract. The threshold must be expressed in sigma units relative to the local distribution — exactly the sigma-normalization principle described earlier.

TickDistill generates big-order primitive records as a shared foundation that downstream paid signals (density, sweep geometry, conviction zones) consume. The primitive itself is free; the derived geometry is paid.

What is trade imbalance, and how does it differ from CVD?

Trade imbalance is the ratio or difference between buy-aggressed and sell-aggressed volume within a fixed clock-time window, reported as a snapshot rather than accumulated. Where CVD grows indefinitely and requires a reset decision, trade imbalance is self-contained within its window.

imbalance_t = (buy_volume - sell_volume) / (buy_volume + sell_volume)   in [-1, +1]

An imbalance near +1 means nearly all aggressive volume in the window was buy-initiated; near -1, sell-initiated; near 0, balanced. For a full treatment, see What Is Trade Imbalance in Order Flow?.

What is the derivatives layer (L2), and what does it add?

The L2 derivatives layer covers open interest (OI), funding rates, and liquidations — data that reflects the positioning and financing state of futures and perpetual markets rather than the immediate tape flow.

Signal	What it measures	Public formula / reference
`funding_regime`	State of the perpetual funding rate (positive/negative/extreme)	Perp mechanics: funding = premium index, settled on Binance's public 8-hour schedule (00:00 / 08:00 / 16:00 UTC) — published exchange mechanics, unrelated to any TickDistill normalization or exclusion window
`oi_shift`	OI × price direction → 4-regime positioning classifier	Hong and Yogo (2012) document OI as a positioning signal; 4-quadrant framework is standard practitioner microstructure
`liq_pressure`	Liquidation volume and cascade risk	Liquidations are public (exchange API); cascade risk from Osler (2005) stop-order clustering logic
`squeeze`	Composite of OI, funding, long/short ratio, liquidations	Composite; calibration proprietary

The derivatives layer is free to access on crypto exchanges (Coinalyze, Binance REST), making L2 derivative signals available at no data cost. The computation and normalization are what TickDistill adds.

What is cross-venue flow, and what does it measure?

Cross-venue flow divergence measures whether aggressive order flow on one exchange is running ahead of flow on another. If buy-side pressure is building faster on Exchange A than Exchange B for the same underlying asset, one of two things is happening: either arbitrage bots have not yet equalized the imbalance, or the flow is genuinely instrument-specific.

Lead-lag analysis at the tick level uses tools from the Hawkes process literature (Bacry, Mastromatteo, and Muzy 2015 for Hawkes-based flow clustering) and the Hayashi-Yoshida estimator (Hayashi and Yoshida 2005) for covariation of non-synchronous tick series. At the tick level, lead-lag relationships are noisy and venue-dependent; TickDistill treats the lead_s and log-likelihood ratio statistics as diagnostics, not definitive directional claims.

FAQ

Q: Does order-flow microstructure work on forex spot markets?
No, not in the same way. Forex spot has no single centralized tape — trades are bilateral and OTC, so there is no universal aggressor field. Trade-aggressor signals (CVD, VPIN, signed order flow, trade imbalance) are not applicable to forex spot without a primary dealer tape, and true order-book OFI is unavailable without a unified L2 book. They apply to centralized crypto exchanges and CME futures, which have unified matching engines.

Q: Is order-flow microstructure the same as technical analysis?
No. Technical analysis derives signals from OHLCV price and volume aggregates (RSI, MACD, Bollinger bands). Order-flow microstructure operates on the underlying trade-by-trade record — the signed, directional raw material that OHLCV data discards. A 1-minute candle collapses thousands of individual trades into six numbers; microstructure reads each trade before that collapse.

Q: What does "aggressor side" mean on an exchange that aggregates trades?
On Binance, the aggTrades feed combines consecutive trades at the same price and direction into a single record, tagged with isBuyerMaker. The aggregation is mechanical and does not imply a single actor placed all that volume. The aggressor field tells you which side was market-order-initiated; it does not identify the participant or prove institutional intent.

Q: Why does the normalization window matter more than the raw signal?
Because a raw imbalance of 60% buy is meaningless without context. In a calm BTC session 60% may be a 2σ rarity; during a volatile session the same 60% might be ordinary noise. Sigma-normalization makes the rarity reading consistent across regimes and assets. The choice of normalization window determines how "recent" the baseline is — too short and it is noisy; too long and it is stale. Calibration is proprietary for each market.

Q: Can I combine microstructure signals to build a directional strategy?
TickDistill sells individual, independently computed signal states — not a combined directional system. Combining signals, deciding weights, and managing the resulting strategy is the client's domain. We sell the measurement, not the alpha — see Why Sell the Measurement, Not the Alpha?.

TickDistill sells clean, computed order-flow inputs — not trading advice or guaranteed alpha. Backtests are illustrative and not a promise of future results.

DEV Community: TickDistill

Sigma-Normalization: Why Order-Flow Signals Should Be Measured in Standard Deviations, Not Raw Numbers

Sigma-Normalization: Why Order-Flow Signals Should Be Measured in Standard Deviations, Not Raw Numbers

The short answer

Why raw thresholds break

What "normalized" actually buys you

The non-negotiable part: point-in-time correctness

How this shows up in the product

What Is Point-in-Time Correctness? Why No-Look-Ahead Makes or Breaks a Backtest

What Is Point-in-Time Correctness? Why No-Look-Ahead Makes or Breaks a Backtest

The short answer

What does "point-in-time correct" mean, exactly?

Why look-ahead bias is so easy to introduce by accident

The causal baseline: t' < t strictly

Mechanical windows: why some events must be excluded from the baseline

Warm-up periods: when a causal baseline is not yet reliable

Anti-look-ahead: the test that verifies the guarantee (Test 5)

Reproducibility: why point-in-time correctness enables version-pinned backtests

How this connects to sigma-normalization and signal quality

How the pipeline enforces these guarantees

FAQ

What Is Cumulative Volume Delta (CVD) and How Do You Read It?

What Is Cumulative Volume Delta (CVD) and How Do You Read It?

The short answer

What is signed taker volume and why does it matter?

What is the CVD formula?

How do you reset or window CVD?

What does CVD-vs-price divergence mean?

What is the aggressor side and why is it the right input?

How does TickDistill compute CVD?

How is CVD related to trade imbalance?

What are the limits of CVD?

FAQ

What Is Order-Flow Microstructure? A Plain-English Guide to Reading the Tape

What Is Order-Flow Microstructure? A Plain-English Guide to Reading the Tape

The short answer

What is the tape, and why does it matter?

What is the aggressor side, and how is it identified?

What is signed order flow, and what does it measure?

What is Cumulative Volume Delta (CVD), and how does it relate to OFI?

What is VPIN, and what does it measure?

What is price impact, and what formulas describe it?

What are the data layers, and what can each layer measure?

What is big-order detection, and why does it matter?

What is trade imbalance, and how does it differ from CVD?

What is the derivatives layer (L2), and what does it add?

What is cross-venue flow, and what does it measure?

FAQ

The causal baseline: `t' < t` strictly