DEV Community: grahammccain

We Added 5 Regime Filters. They Don't Do Much. Here's Why That's Interesting.

grahammccain — Tue, 14 Apr 2026 17:30:43 +0000

What we tested

This week we added 5 regime filters to the cohort API: same_vrp_bucket (variance risk premium), same_term_bucket (VIX term structure), same_credit_bucket (HYG/LQD credit spread proxy), same_curve_bucket (yield curve slope), and same_breadth_bucket (market breadth). The academic literature on return predictability says these should materially condition forward return distributions, with VRP specifically called out as the best single-factor regime predictor.

We ran the test honestly: 200 anchors with known 5d and 10d forward returns, six cohort modes per anchor (baseline + each regime filter applied alone), 2,400 total cohort runs. For each, we measured interquartile range width, [p10, p90] band width, and held-out-coverage of actuals.

The result

Across 5 and 10 day horizons, the 5 regime filters produced distribution widths that differed from baseline by 0.2 percentage points or less. Empirical coverage shifted by 1-2 percentage points. The n (cohort size) barely changed — baseline drew 198 neighbors; every filtered version drew 199-200.

5d baseline IQR: 4.17%. same_vrp: 4.21%. same_curve: 4.07%. Max shift: 0.14pp.
5d baseline 80-band width: 8.78%. Max shift across filters: 0.21pp.
10d baseline IQR: 5.88%. same_credit: 6.25%. Max shift: 0.37pp.
Empirical [p10,p90] coverage on held-out actuals: baseline 73.5% (5d) / 71.0% (10d), regime-filtered all within ±2pp.

Why the filters don't bite

The filters are real and the columns they reference are populated across 24M+ embeddings. But at the ±0.15 percentile bucketing we chose, the filter keeps roughly 70% of the base pool. When you already have 200 near-neighbors from a 24M-row kNN, dropping 30% of candidates barely changes which 200 bubble to the top.

There's a second, subtler reason: the kNN search is over shape embeddings that were computed from price + volume + volatility signals. Patterns that are shape-similar tend to already be drawn from similar regimes — you don't get a roaring-bull-market pattern and a 2008-crash pattern as nearest neighbors. The regime filter is redundant with information the embedding already captured.

What this tells an agent builder

The lesson isn't that regime doesn't matter — it's that regime matters implicitly once you retrieve by shape. If you're already using shape-based kNN, layering a loose regime filter on top buys you very little. The cases where regime filtering WILL bite are:

Tight bucketing (±0.05 percentile) instead of loose (±0.15). This drops cohort size materially and should move distributions — at the cost of higher variance on the remaining estimate.
Interaction filters (same_vrp AND same_term AND same_credit) that restrict to a specific regime combination — probably the correct default when an agent is reasoning about a specific macro setup.
Regime-stratified calibration: fit separate conformal offsets per regime bucket so the bands reflect 'what happens in high-VIX high-VRP environments specifically.' This is probably where the real win lives.

What we're doing about it

The filters ship as-is because they do still constrain the cohort (just mildly), they're cheap to apply, and they give agents a clean way to say 'only match within similar macro conditions.' Users who want stronger effects can stack them — our MCP tool documentation now reflects that.

The next experiment is interaction filters: same_vrp AND same_term AND same_credit simultaneously, at a ±0.10 bucket. That should materially change cohort composition. If it does, we'll publish the delta; if it doesn't, we'll publish that too.

CTA — This is the kind of audit agent builders should demand from any historical-pattern API. If a provider claims their filters condition distributions, ask for the IQR shift. If they can't produce it, the filters are decoration. Ours are documented at chartlibrary.io/calibration.

Originally published at chartlibrary.io. Chart Library is the stock-market memory for AI agents — free Sandbox tier at chartlibrary.io/developers.

From Retrieval to Calibrated Retrieval: Conformal Prediction on Agent Base Rates

grahammccain — Tue, 14 Apr 2026 13:16:41 +0000

The problem we caught in our own product

Our cohort API returns a distribution of forward returns for a chart pattern — p10, p25, median, p75, p90. An agent calling it should be able to trust those numbers for sizing. If the agent reads '[p10, p90] is [-3%, +5%]' it should act on roughly an 80% confidence that the outcome lands in that range.

It didn't. We audited our own endpoint against 400 held-out anchors with known forward returns and measured how often the actual return fell inside the band we published.

Nominal [p10, p90] coverage: 80%. Empirical: 68.2% (5d), 64.2% (10d).
Nominal [p25, p75] coverage: 50%. Empirical: 40.0% (5d), 43.2% (10d).
The medians were fine — 0.19% actual vs 0.15% predicted at 5d. The failure was entirely in the band widths.

Put bluntly: if an agent used our raw bands to size a position, it was taking roughly 1.2× more risk than it thought. That's the kind of silent failure mode that makes people distrust AI-assisted trading tools, and it was hiding in our own product.

Why retrieved quantiles miscalibrate

The raw cohort quantiles come from nearest-neighbor retrieval in embedding space — we pull 200 historical patterns similar to the anchor and read off their return percentiles. The math treats those 200 matches as if they were an iid sample from the same distribution the anchor came from. They aren't.

Near-neighbor matches are systematically closer to each other than to the anchor — they're selected for shape similarity, not randomness. That shrinks the variance of the empirical quantiles. p10 reads as -3% when the real tail is closer to -5%. The mechanism is structural, not a bug, and it shows up in any system that publishes quantiles derived from retrieval without a calibration step.

The fix: split conformal prediction

Conformal prediction is the standard statistical tool for this. The version we use is split conformal for quantile regression (CQR-style):

Hold out a calibration set of anchors with known forward returns.
For each calibration sample, compute a nonconformity score: max(p_lo - y, y - p_hi) — how far outside the raw band the actual outcome was.
Take the (1 - α)(n+1)/n empirical quantile of those scores. That's your additive offset q.
Calibrated band = [p_lo - q, p_hi + q]. By construction this hits ~(1 - α) coverage on exchangeable data.

It is a band correction, not a median shift — the p50 is already unbiased. The offset is fit per-horizon, checked into the repo as services/conformal_offsets.json, and applied on every /cohort response.

Validation on the held-out half

Split the 800-row calibration sample 50/50. Fit the conformal offsets on one half, measure empirical coverage on the other:

5d [p10, p90] — raw 68.0%, calibrated 82.5% (target 80%)
5d [p25, p75] — raw 40.0%, calibrated 48.5% (target 50%)
10d [p10, p90] — raw 64.5%, calibrated 80.5% (target 80%)
10d [p25, p75] — raw 42.5%, calibrated 53.5% (target 50%)

Numbers are now inside the tolerance a reasonable agent would expect. The offsets themselves are small in absolute terms — ±1.4pp for 5d, ±2.6pp for 10d — but the coverage gap they close is large because it compounds at the tails.

What agents should ask of any retrieval API

If you're building on any historical-pattern API, demand three things before you size a trade off a retrieved quantile:

An empirical coverage number validated on held-out anchors, not just 'p10' as a label.
A calibrated band you can use for sizing, separate from the raw band you use for ranking.
The calibration set size and method, so you can judge whether their 80% means the same thing as your 80%.

We now return calibrated_return_pct alongside return_pct on every cohort response, plus a calibration meta block with coverage_80_validated, coverage_50_validated, and n_validation. That's the evidence, not the claim. The MCP tool description tells agents which band to use for which job. None of this was here a week ago.

What we still owe you

Split conformal is the minimum viable calibration — it gives one offset per horizon across all cohort configurations. Small cohorts and regime-extreme buckets are almost certainly miscalibrated in their own ways, and a uniform offset under-corrects for them. The next version is bucket-aware: separate offsets by cohort size and by regime bin. That work is queued.

Longer term, the calibration model should consume cohort features (size, filter-stack, distance distribution) and output band widths directly. But the honest version of the story is that 800 anchors isn't enough to fit that yet, so we shipped the simpler correction that already closes most of the gap and we'll keep widening the calibration set.

CTA — Ready to build on calibrated retrieval? Grab an API key at chartlibrary.io/developers and the MCP server on PyPI (chartlibrary-mcp v1.4.0). Every cohort response now includes calibrated_return_pct and a validated coverage number.

Originally published at chartlibrary.io. Chart Library is the stock-market memory for AI agents — free Sandbox tier at chartlibrary.io/developers.

How to Build a Stock-Research Agent That Doesn't Hallucinate

grahammccain — Tue, 14 Apr 2026 12:29:25 +0000

The problem every stock-research agent has

If you've built an AI agent that answers questions like 'what usually happens after a breakout like this in NVDA,' you've hit the same wall everyone does: the model confidently narrates a number that has no historical backing. The base rate is either invented or pulled from the model's training cut-off, not from real data conditioned on the actual setup.

The fix is structural, not prompt-engineered. You need a tool the agent calls that returns real conditional base rates — not 'on average, NVDA goes up X%' but 'given this chart shape, filtered by current regime and sector, in a corpus of historical analogs that includes delisted names, here's the distribution of forward returns.' One call, one number the agent can reason about, one sample size so it knows when to hedge.

The primitive: POST /api/v1/cohort

Chart Library's Conditional Distribution endpoint is the smallest composable unit for this pattern. You send an anchor (symbol + date) and optional filters, you get back a cohort of historical matches plus the distribution of outcomes at 1/5/10 day horizons:

anchor
symbol
NVDA
date
2024-06-18
horizons
top_k

coh_...

return_pct

p10

p50

p90

hit_rate

above_entry

included_delisted

total_matches

Every response includes a 15-minute cohort_id you can refine progressively, and a survivorship flag so the agent knows whether delisted names are part of the base rate.

Three filter dimensions that matter

The reason shape-only matching doesn't produce alpha on its own is that outcomes are conditional on context. The cohort API takes three filter dimensions that meaningfully shift the distribution:

same_as_anchor
filters.regime.same_vix_bucket = true keeps only matches whose VIX regime is within ±15 percentile of today's
filters.regime.same_trend = true matches the sign of the SPY 20d trend at the match date

Real example: NVDA 2024-06-18 unfiltered shows 54% up at 5 days across 492 analogs. Apply same_sector + same_vix_bucket and 1d drops to 48.6% up while 10d rises to 55.2% — a meaningful conditional pattern (short-term mean reversion, medium-term continuation) that's invisible in the unconditional stats.

The edge-mining loop (where it gets powerful)

Single calls are fine. The real leverage is the loop: start broad, ask which filter matters, narrow, repeat. Three tools:

POST /api/v1/cohort — the initial cohort. Returns cohort_id.
GET /api/v1/cohort/{id}/explain — ranks candidate filters (VIX regime, trend, recent-5-years) by how much each one shifts the above-entry hit rate. Tells the agent which dimension is actually moving the distribution for this specific setup.
POST /api/v1/cohort/{id}/filter — narrows the stored cohort with whichever filter was most informative. No kNN re-run (sub-second) and returns a new cohort_id so agents can branch.

This is how agents (and humans) discover conditional structure rather than pattern-match to a canned base rate. The cohort_id keeps the expensive embedding search cached, so refinement is free. Fork, compare, keep the branch with the highest-confidence distribution.

MCP: one tool call in any agent framework

NVDA
2024-06-18
coh_...
coh_...

Drop the MCP server into your CrewAI, LangGraph, AutoGen, or Claude function-calling setup. The agent discovers the tool, calls it, and returns a number grounded in real historical base rates instead of a number it made up.

Why this matters

The next wave of AI agents in finance will be judged on whether their answers are wrong in ways users can't detect. A hallucinated base rate is indistinguishable from a real one at the language-output level. The only structural defense is to ground every claim in a retrieval call backed by real data — conditional, explicit, sample-sized, and survivorship-aware.

Chart Library's cohort primitive is built for exactly that pattern. Free sandbox tier, $29 Builder, $299 Agent (with burst + session handles + 1K req/min), and the MCP server is one pip install away.

CTA — Ready to build? Grab an API key at chartlibrary.io/developers and the MCP server on PyPI (chartlibrary-mcp). The conditional distribution primitive is live on the Free tier.

Originally published at chartlibrary.io. Chart Library is the stock-market memory for AI agents — free Sandbox tier at chartlibrary.io/developers.

3 Patterns for AI Agents That Analyze Stock Charts

grahammccain — Tue, 14 Apr 2026 12:26:32 +0000

Why these three

If you've shipped an AI agent that answers stock questions, you've hit a predictable set of failure modes: it invents base rates, stops at the first retrieval instead of probing conditional structure, and strips the narrative hooks users actually remember. This post names the three patterns we ship in our own API so you can apply them regardless of which chart-data provider you end up using.

All three map to the same underlying idea: a stock-research agent should expose retrieval-first composable primitives to the LLM, and force synthesis into the final turn. Everything here is domain-independent — the techniques port to any agent answering 'what usually happens after X' questions in finance, sports, operations, or science.

Pattern 1: Grounded base rates (no hallucinated statistics)

The failure: Claude/GPT is happy to answer 'what usually happens after a NVDA-style breakout' with invented percentiles and sample sizes. The numbers sound real because they're formatted like real numbers.

The fix: a single tool that returns real conditional distributions with sample size and survivorship flag, plus a system prompt that forbids inventing forward-return statistics. The agent MUST call the tool before making any claim about 'typically' or 'usually.'

Tool returns: percentile distribution of forward returns (p10/p25/p50/p75/p90)
Per-horizon MAE (max adverse excursion) and MFE (max favorable excursion)
Realized vol distribution (for options or vol-scaling)
Hit rates: above-entry, MFE>1%, MAE<-1%
Sample size n AND survivorship flag (how many delisted names in cohort)
Every response gets a cohort_id for downstream tools

System prompt template: 'You are a stock-research assistant. If the user asks about forward returns, hit rates, drawdowns, or pattern outcomes, you MUST call get_cohort_distribution first. Quote the sample size in your answer. Disclose the survivorship flag. Never quote a percentile you did not see in tool output.'

Seems obvious, but agents written without this constraint invariably produce authoritative-sounding sentences with zero grounding. Add the tool + the constraint together; neither works alone.

Pattern 2: The edge-mining loop

The failure: agents stop at the first retrieval. They call the base-rate tool once, get an answer, and write it up. Whatever conditional structure lives INSIDE the cohort — the part that actually matters for trading — never surfaces.

The fix: expose two more tools the agent can chain after the initial cohort. One ranks which additional filter would move the distribution most (explain). The other applies a filter to fork a narrower cohort (refine). Agents iterate until they've identified the dimension that actually matters.

cohort(anchor, filters) returns cohort_id
explain(cohort_id, horizon) ranks candidate filters by |shift on above-entry rate|
refine(cohort_id, filter) applies that filter, returns new cohort_id
Agent loops: cohort → explain → refine → maybe explain again → synthesize

Why this works: sub-second refinement on a stored cohort (no repeat retrieval) means the agent can fork 5 branches and compare. Agents trained on tool use will do this naturally once they have the primitives. Agents routed through a LangGraph StateGraph execute the loop deterministically and use the model only for the final synthesis step.

Outputs agents write when given these tools: 'Baseline: 54% above-entry across 491 analogs. Narrowing by same_vix_bucket drops to 48% at 5d but climbs to 55% at 10d — short-term mean reversion, medium-term continuation.' That's a real trading insight, and it only emerges from the loop.

Pattern 3: Named analogs for narrative

The failure: your API returns 10 similar historical patterns, each a (symbol, date) tuple with a number. The agent dutifully lists them. The user's eyes glaze over. The most valuable piece — 'one of these analogs was Silicon Valley Bank the week it collapsed' — is invisible.

The fix: attach a named_event field to any match that falls inside a curated window of a famous market moment. It's a small curation job (30-50 events cover most of what retail traders and content agents care about) but it turns every match row into a potential headline.

Curate notable (symbol, date-range, label, description) tuples
At match-return time, check each match against the catalog
Attach {slug, label, description} to matches inside a window
UI renders a small colored pill; content agents pull label into a headline

Events to seed: bank collapses (SIVB, FRC, SBNY, CS, PACW), M&A exits (TWTR, ATVI), macro inflection points (COVID crash + bottom, 2022 bear low, Russia invasion, peak CPI), narrative blowoffs (GME squeeze, 2021 growth peak), AI-era milestones (NVDA 2023-05 breakout, DeepSeek shock). 30 events get you 80% coverage.

Outputs: 'NVDA today maps onto SIVB 2023-03-08 (bank collapse, -71% in 10 days) as the 4th-closest analog' — that's the sentence a market writer needs, and it comes for free from a tiny curated catalog.

Putting it together

All three patterns are live in Chart Library's API and MCP server. The system prompt in our Claude Agent example enforces pattern 1. The LangGraph example at github.com/grahammccain/chart-library/tree/main/integrations/langgraph shows pattern 2 as a StateGraph. Pattern 3 is a single named_event field on every top_match in cohort responses.

But the patterns are bigger than us. If you're building a stock-research agent on a different stack (Bloomberg API, Polygon direct, your own embeddings), implement these three primitives the same way and you'll have a system that grounds claims, discovers conditional structure, and surfaces narrative hooks automatically.

CTA — All three patterns work on the free Sandbox tier (200 calls/day). Grab a key at chartlibrary.io/developers and try the full loop in under 20 lines of Python.

Originally published at chartlibrary.io. Chart Library is the stock-market memory for AI agents — free Sandbox tier at chartlibrary.io/developers.

Eval Integrity: How We Found the Leakage and Why Our Baseline Lied

grahammccain — Tue, 14 Apr 2026 12:26:31 +0000

Why this matters for agent developers

If you're building an AI agent that calls Chart Library, you're trusting our historical base rates. If those base rates are inflated by leakage, your agent's sizing, stop placement, and confidence calibration are all downstream of a lie. That's not acceptable for anyone we want as a customer, so we audited ourselves and published what we found.

Short version: our internal baseline for shape-embedding direction accuracy was 51.6% — barely above the 51.2% coin-flip floor. That 0.4 percentage points was being measured against a split that let the model find near-duplicates of every query in the training set. Once we fixed it, the number went back to ~51.2%. We never had signal where we thought we did. Now we know.

Bug #1 — Same-symbol cross-split correlation

We split training from validation by date (train < 2025, val = 2025). The problem: AAPL has a chart-pattern embedding for every trading day. Its embedding on 2024-12-30 and 2025-01-02 are almost the same vector (same symbol, mostly the same preceding bars).

When we ran k-nearest-neighbor evaluation on val samples, the nearest training neighbor for AAPL 2025-01-02 was AAPL 2024-12-30 — which IS in the training set. The model wasn't finding 'similar historical patterns.' It was finding itself from a few days earlier.

53.6% of validation samples had a same-symbol training neighbor within 20 trading days
Direction-accuracy lift was driven almost entirely by this correlation
Symbol-disjoint splits (hold out tickers entirely, not dates) give honest numbers

Bug #2 — Forward-return window leakage

The 5-day forward return for a training sample dated 2024-12-30 uses closing prices through 2025-01-06. That's inside the validation window. So the training label itself depends on bars the model supposedly hasn't seen.

This is smaller than bug #1 (24,448 train rows affected, ~3K in our 1M random sample), but it's additive. The correct fix is a purge-and-embargo window at every split boundary equal to the longest forward horizon. We use 10 trading days.

What we changed

Going forward, every evaluation on Chart Library's embedding quality uses:

Symbol-disjoint splits — 70% of tickers in train, 15% in val, 15% in test. No ticker appears in more than one split.
Purge-embargo of 10 trading days at any remaining date boundary (e.g. walk-forward).
Sample-size reporting on every reported metric, with confidence intervals.
Open publication of the baseline so future model updates have to beat an honest number, not an inflated one.

What this implies about the product

Pure shape-similarity direction accuracy on a symbol-disjoint, embargoed holdout is at or near 51% — essentially coin-flip. This isn't a flaw in our embeddings; it's the actual state of the problem. Predicting 5-day direction from a single chart shape is one of the hardest signal-extraction problems in finance, and it's well-documented in the academic literature that pure-price features have very low information ratio before regime/liquidity/volume conditioning.

The leverage is not in the average. It's in the cohort. When you condition on regime bucket + sector + liquidity + event proximity, the conditional distribution of outcomes is materially different from the unconditional one. That's why we're building toward a Conditional Distribution API — one call, filter by context, get back path percentiles with sample size.

INFO — If you're evaluating a historical-pattern vendor and their baseline looks too good, ask them: (1) how are splits constructed, (2) what's the embargo window, (3) how do you handle same-symbol overlap. If they can't answer, the numbers are probably lying.

Our ongoing commitment

We'll keep publishing what we find, including when our own numbers go the wrong direction. Agent developers should be able to trust the calibration of any base rate we expose. That means honest audits, honest baselines, and honest docs — even when the honest answer is less impressive than the marketing one.

Originally published at chartlibrary.io. Chart Library is the stock-market memory for AI agents — free Sandbox tier at chartlibrary.io/developers.

Multi-Agent Stock Research with CrewAI + Chart Library

grahammccain — Mon, 06 Apr 2026 17:23:46 +0000

When you ask a single AI agent to research a stock, it tries to do everything at once: check the chart pattern, assess the market, evaluate sectors, and synthesize a view. The result is usually shallow.

Multi-agent systems fix this by splitting work across specialists -- exactly how institutional research desks operate. CrewAI lets you build that structure with AI agents and Chart Library's pattern intelligence API.

The Crew Design: Two Specialist Agents

Pattern Analyst -- specializes in individual stock chart analysis. Uses Chart Library's intelligence endpoint to find historically similar charts, reads forward return statistics, and runs scenario stress tests. Always cites specific numbers: "7 of 10 similar patterns went up over 5 days, averaging +2.3%."

Regime Analyst -- specializes in the market-wide environment. Checks SPY/QQQ regime status, sector rotation, and crowding risk. Frames everything as historical analogy: "similar conditions historically led to..." rather than predictions.

Agent backstories matter more than you'd expect. A backstory that says "you never make predictions, you present historical context" produces very different output than "you are a bold market forecaster."

Defining the Tools

Each agent gets curated tools from Chart Library's Python SDK:

pip install crewai crewai-tools chartlibrary

Pattern Analyst tools:

chart_intelligence -- calls cl.intelligence(symbol, date) for 10 most similar historical patterns + forward returns
scenario_stress_test -- calls cl.scenario(symbol, market_move_pct) for "what if SPY drops 5%?"
daily_top_picks -- calls cl.discover(limit=10) for today's top patterns

Regime Analyst tools:

market_regime -- calls cl.regime(compact=True) for SPY, QQQ, and 11 sector ETFs
crowding_detector -- calls cl.crowding() for systematic risk signals
sector_rotation -- calls cl.sector_rotation(lookback) for momentum rankings

Building the Tasks

The order matters: regime assessment first (establishes context), then stock analysis (uses that context), then synthesis (combines both).

Task 1 - Regime Assessment: Check SPY/QQQ, identify leading/lagging sectors, evaluate crowding risk.

Task 2 - Stock Analysis: For each symbol in your watchlist, report match count, quality, forward returns. Receives Task 1's output as context -- so findings can be framed like "NVDA's bullish pattern aligns with the current risk-on regime."

Task 3 - Synthesis: Produces a structured briefing: Market Environment, Stock Highlights, Risk Factors, Bottom Line.

Running the Crew

crew = Crew(
    agents=[pattern_analyst, regime_analyst],
    tasks=[regime_task, analysis_task, synthesis_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff()

Example Output

The Regime Analyst checks SPY: 7 of 10 historically similar regimes gained over 10 days. Tech (XLK) and Industrials (XLI) leading. Moderate crowding in large-cap tech.

The Pattern Analyst analyzes each symbol:

NVDA: 10 matches, top match 94% similarity, 8/10 positive over 5 days (+3.1% avg)
AAPL: More mixed -- 6/10 positive, +0.8% average
TSLA: Historically volatile, wide outcome range

The synthesis combines both: bullish regime supports NVDA, AAPL is neutral, TSLA's wide range means sizing matters more than direction.

Extending the Crew

Risk Manager agent: Run stress tests for -3%, -5%, -10% market moves across the watchlist
Portfolio Optimizer agent: Suggest position sizes based on conviction and correlation
Hierarchical process: A Research Director that delegates dynamically instead of fixed sequence

Start with two agents and add complexity only when you hit a real limitation. Multi-agent systems are powerful but harder to debug.

The complete working example is at github.com/grahammccain/chart-library-mcp in examples/crewai_tutorial.py.

Get your free API key at chartlibrary.io/developers and build your first research crew today. 24M patterns. 10 years. One API call.

How to Use an MCP Server for Stock Analysis with Claude

grahammccain — Mon, 06 Apr 2026 17:17:12 +0000

Model Context Protocol (MCP) is an open standard that lets AI assistants like Claude connect to external data sources and tools. Instead of copying data into a prompt, you install an MCP server that gives the AI direct access to a service's API.

Chart Library's MCP server gives Claude (and any MCP-compatible AI) access to 24 million pre-computed chart pattern embeddings, real-time regime data, sector analysis, and forward return statistics.

Installation: Two Commands

pip install chartlibrary-mcp

Get a free API key at chartlibrary.io/developers (200 calls/day free)
Add the server to Claude Desktop: Settings > Developer > MCP Servers > Add
Enter the command: chartlibrary-mcp --api-key YOUR_KEY
Restart Claude Desktop -- Chart Library tools appear in the tools panel

The MCP server also works with Cursor, Windsurf, and any editor or agent framework that supports MCP.

What the 19 Tools Do

You never call them directly -- Claude picks the right tool based on your question:

Pattern Intelligence (3 tools): Search for similar chart patterns, analyze ticker+date combos, get full intelligence reports with AI summaries
Market Context (4 tools): Market regime for SPY/QQQ, sector rotation rankings, crowding detection, scenario analysis
Discovery (3 tools): Today's top patterns, trending tickers, interesting setups
Portfolio & Alerts (4 tools): Add holdings, portfolio analysis, pattern alerts
Data & Utilities (5 tools): Ticker lookup, historical prices, API usage, accuracy stats

Example Conversation

You: "What does NVDA's chart look like right now? Anything interesting?"

Claude calls: get_intelligence(symbol="NVDA")

Claude: "NVDA's current intraday pattern closely matches 10 historical setups. The top match is AMD from August 2024 at 93% similarity. The average 5-day return is +1.6% with 7 of 10 matches moving higher."

You: "What's the broader market doing? Should I be worried?"

Claude calls: get_regime(), get_crowding()

Claude: "The current SPY regime most closely resembles March 2024 -- a slow grind higher on moderate volume. 8 of 10 similar periods saw gains over 10 days. However, the crowding detector shows elevated correlation across tech stocks, which historically precedes short-term pullbacks in about 40% of cases."

MCP vs. REST API: When to Use Which

Both access the same data. The difference is the interface:

Use Case	Best Choice
Conversational research	MCP
Automated trading systems	REST API
Building dashboards	REST API
Ad-hoc market analysis	MCP
Multi-agent workflows	Either

Many users keep the MCP server running in Claude Desktop for quick research while also calling the REST API from their trading scripts.

Get Started

pip install chartlibrary-mcp

The free tier gives you 200 calls per day -- more than enough for a full day of research. For more advanced use cases, check out the LangChain tutorial and CrewAI tutorial.

Chart Library: 24 million chart pattern embeddings. 10 years of history. Pattern intelligence for AI agents. chartlibrary.io/developers

Build a Stock Research Agent with LangChain + Chart Library in 20 Minutes

grahammccain — Mon, 06 Apr 2026 17:17:11 +0000

Most AI trading agents work with raw price data: OHLCV bars, moving averages, maybe some technical indicators. They can tell you what the price is, but not what it means.

Chart Library's API gives agents something they can't get anywhere else: pre-computed pattern similarity across 24 million embeddings spanning 10 years and 19,000+ symbols. Instead of asking "what's the price of NVDA?", your agent can ask "find the 10 most similar historical charts to NVDA right now and tell me how those patterns resolved."

Setting Up

You need three packages and two API keys:

pip install langchain langchain-openai chartlibrary

Get a free Chart Library API key at chartlibrary.io/developers (200 calls/day on the free tier)
Set your environment variables: CHART_LIBRARY_KEY=cl_... and OPENAI_API_KEY=sk-...

Note: You can swap OpenAI for Anthropic, Groq, Ollama, or any LangChain-compatible model. Just change the LLM initialization -- the tools stay the same.

Creating the Tools

LangChain agents work by selecting from a set of tools based on the user's question. Each tool wraps one or more API calls and returns a formatted string the LLM can reason over. We define five tools that cover Chart Library's core capabilities.

The chart_intelligence Tool

Here's the core tool that powers most agent interactions:

@tool
def chart_intelligence(symbol: str, date: str = "") -> str:
    """Get full pattern intelligence for a stock ticker.
    Returns the 10 most similar historical chart patterns,
    what happened after those patterns (1/3/5/10-day forward
    returns), and an AI-generated summary."""
    data = cl.intelligence(symbol, date=date or None, compact=True)
    # Format matches, forward returns, and summary
    # into a readable string for the LLM
    return formatted_result

The tool extracts the top 3 matches with similarity scores, forward return statistics (average return, win rate) for each horizon, and the AI-generated summary.

Market Regime and Sector Tools

Beyond individual stocks, agents need market context:

market_regime -- calls the /regime endpoint for SPY, QQQ, and all 11 S&P sector ETFs with historical pattern matching
sector_rotation -- sector momentum rankings
crowding_detector -- checks if many stocks are trading the same pattern (signals systematic risk)
scenario_analysis -- stress-tests a stock against a hypothetical market move

Building the Agent

With the tools defined, assembling the agent takes about 10 lines:

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
tools = [chart_intelligence, market_regime,
         crowding_detector, scenario_analysis,
         sector_rotation]

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(
    agent=agent, tools=tools,
    verbose=True, max_iterations=5
)

Example Conversation

Here's what a real session looks like:

User: "What does NVDA's chart look like right now? How did similar patterns play out?"

Agent calls: chart_intelligence("NVDA")

Agent: "NVDA's current chart pattern matches 10 historical patterns with high similarity. The top match is AMD from 2023-08-14 at 94% similarity. Looking at what happened after these patterns: over 5 days, the average return was +1.8% with 7 of 10 matches moving higher. Over 10 days, the average was +2.4% with 6 of 10 positive."

User: "What about the broader market? Is there anything concerning?"

Agent calls: market_regime(), crowding_detector()

Agent: "SPY's 30-day change is at +3.2%, with 8 of 10 similar historical regimes seeing positive 10-day returns. However, the crowding detector is showing elevated readings -- many stocks are trading in lockstep, which historically signals higher correlation risk."

Extending the Agent

The five tools above cover the core use cases, but Chart Library's API has more endpoints you can wrap:

Alerts: Let the agent set up pattern alerts ("tell me when a chart similar to AAPL 2024-01-15 appears again")
Portfolio analysis: Connect the agent to a user's portfolio for nightly pattern monitoring
Historical backtest: Run pattern-based backtests to validate strategies
Multi-timeframe: Search across 5min, 15min, 30min, 1hr, or multi-day windows

The full tutorial code is on GitHub at github.com/grahammccain/chart-library-mcp in the examples/ directory.

What's Next: Multi-Agent Workflows

A single agent is useful. The next level is multi-agent workflows where specialized agents collaborate -- a Pattern Analyst, a Regime Strategist, and a Risk Manager that discuss, debate, and produce a unified research brief.

Chart Library's MCP server (pip install chartlibrary-mcp) works with any MCP-compatible agent framework -- including Claude Desktop, Cursor, and Windsurf.

Get a free API key at chartlibrary.io/developers and start building. 24 million chart pattern embeddings, 10 years of history, one API call.