Fatih İlhan

Posted on Apr 9

I Built a Trading Signal Engine That Reads Congressional Insider Trades — Here's the Architecture

#programming #productivity #automation

Congressional members beat the market by 12% on average. I built a system to find out why — in real time.

There's a dataset hiding in plain sight. Every time a U.S. senator or congressman buys stock, they're legally required to disclose it within 45 days. Every time a corporate CEO buys their own company's shares, that Form 4 hits the SEC within 2 business days.

Most people scroll past these filings. I built a machine to read all of them, filter out the noise, and surface only the trades worth paying attention to.

This is the architecture behind Insider Signal Engine — a personal trading signal tool I built in a few weeks using Next.js, Supabase, and Cloudflare Workers.

The Problem: Raw Insider Data Is Mostly Noise

The data is public. The problem is the signal-to-noise ratio.

In any given week, you might see 400+ congressional trade disclosures. But most of them are:

Sales (informationless — could be divorce, taxes, anything)
Filed 38 days after the trade (already priced in)
Tiny amounts ($1K–$15K, basically rounding errors)
From members with no relevant committee oversight

Running a filter stack on that data is the whole game.

Stack

Layer	Choice	Why
Framework	Next.js 14 App Router	Dashboard + API routes in one project
Database	Supabase (Postgres)	RLS-ready for multi-tenant SaaS later
Hosting	Cloudflare Pages	Edge performance, generous free tier
Cron	Cloudflare Workers (scheduled)	Runs the ingestion pipeline every 4 hours
Primary data	Quiver Quant API ($10/mo)	Congressional + insider trades, clean REST API
Secondary data	Financial Modeling Prep (free tier)	Earnings calendar, corporate Form 4

The ingestion pipeline runs as a Cloudflare Worker on a cron schedule, hits an internal protected API route, and writes to Supabase. The Next.js dashboard reads from Supabase and renders signals sorted by score.

Architecture

┌─────────────────────────────────────────────┐
│         CLOUDFLARE WORKER (every 4h)         │
│                                              │
│  1. Fetch congress trades from Quiver Quant  │
│  2. Fetch corporate insider trades (FMP)     │
│  3. Normalize into unified RawTrade schema   │
│  4. Run 7-filter stack                       │
│  5. Score survivors (0-100)                  │
│  6. Upsert into Supabase signals table       │
└──────────────────────┬──────────────────────┘
                       │
                       ▼
             Supabase (Postgres)
             ├── raw_trades  (append-only log)
             ├── signals     (filtered + scored)
             └── politicians (track record)
                       │
                       ▼
          Next.js Dashboard (App Router)
          /              → Signal feed
          /politicians   → Leaderboard by hit rate
          /ticker/[sym]  → Per-stock activity
          /backtest      → Historical performance

The 7-Filter Stack

This is the core of the engine. Every trade must pass all 7 filters to become a signal. Filters are pure functions — simple to test, easy to tune.

type TradeFilter = (trade: RawTrade) => boolean;

Filter 1: Purchases Only

const filterPurchaseOnly: TradeFilter = (trade) =>
  trade.trade_type === 'purchase';

Sales have too many non-informative motivations — taxes, diversification, estate planning. Buys are different. Nobody buys their own stock for the wrong reasons.

(Exception: unusually large sales >$500K go to a separate "bearish watchlist" — not implemented yet.)

Filter 2: Filing Delay ≤ 7 Days

const filterFilingDelay: TradeFilter = (trade) => {
  const delay = differenceInDays(
    parseISO(trade.filing_date),
    parseISO(trade.trade_date)
  );
  return delay <= 7;
};

Congress has 45 days to disclose. Most of them use every day of it. This filter rejects ~80% of congressional trades by design.

The fast-filers are a self-selecting group. When a senator buys $500K of defense stock and files the next day, that's a different animal from someone who files at day 44.

Filter 3: Minimum Size ≥ $50K

const filterMinSize: TradeFilter = (trade) => {
  const amount = trade.amount_high ?? trade.amount_low ?? 0;
  return amount >= 50_000;
};

Congress reports in ranges — $1K–$15K, $15K–$50K, $100K–$250K. We use the upper bound. The $50K floor eliminates noise buys and auto-purchase plans.

Filter 4: Relevance — Committee Match or C-Suite

const COMMITTEE_SECTOR_MAP: Record<string, string[]> = {
  'Armed Services': ['defense', 'aerospace'],
  'Energy and Commerce': ['energy', 'utilities', 'healthcare'],
  'Finance': ['banks', 'fintech', 'crypto'],
  'Intelligence': ['defense', 'cybersecurity', 'tech'],
  // ...
};

const C_SUITE_TITLES = ['CEO', 'CFO', 'COO', 'CTO', 'President', 'Chairman'];

A senator on the Senate Intelligence Committee buying a cybersecurity stock is a different signal than a backbencher doing the same. A CEO buying their own stock means something. A Director buying theirs means less.

Filter 5: Cluster Detection

Not a per-trade filter — a post-filter enrichment. After filters 1-4, we group surviving trades by ticker in a 30-day sliding window:

function detectClusters(
  filteredTrades: RawTrade[],
  windowDays: number = 30
): ClusterResult[] {
  const byTicker = groupBy(filteredTrades, 'ticker');

  for (const [ticker, trades] of Object.entries(byTicker)) {
    if (trades.length < 2) continue;
    // sliding window: find groups of 2+ trades within windowDays
  }
}

When 3 different insiders buy the same ticker in the same month, that's a cluster. Clusters get heavily rewarded in the scoring model.

Filter 6: No Earnings Gamble (Async)

const filterNoEarningsGamble = async (trade: RawTrade): Promise<boolean> => {
  const earningsDate = await getNextEarningsDate(trade.ticker); // FMP API
  if (!earningsDate) return true;

  const diffDays = differenceInDays(
    parseISO(earningsDate),
    parseISO(trade.trade_date)
  );

  return diffDays < 0 || diffDays > 5;
};

Insider buys 3 days before an earnings beat look brilliant in hindsight. They're also binary event gambling. This filter runs only on the survivors from 1-4, minimizing API calls to FMP's free tier (250 req/day limit).

Filter 7: Technical Check (Stub — Phase 2)

Placeholder for a 200-day SMA guardrail: reject trades where the stock is more than 20% below its long-term average. Falling knives are falling knives, even with insider buying.

The Scoring Model (0–100)

Every trade that survives the filters gets a score. The score is a sum of 6 components:

interface ScoreBreakdown {
  size_score: number;         // 0-20
  delay_score: number;        // 0-15
  cluster_score: number;      // 0-25  ← most impactful
  filer_track_record: number; // 0-20
  relevance_score: number;    // 0-10
  recency_score: number;      // 0-10
}

The thresholds:

// SIZE (0-20)
if (amount >= 500_000)      size_score = 20;
else if (amount >= 250_000) size_score = 16;
else if (amount >= 100_000) size_score = 12;
else if (amount >= 50_000)  size_score = 8;

// FILING DELAY (0-15)
if (delayDays <= 1)      delay_score = 15;
else if (delayDays <= 3) delay_score = 12;
else if (delayDays <= 5) delay_score = 8;
else if (delayDays <= 7) delay_score = 4;

// CLUSTER (0-25) — the big one
if (cluster_strength >= 4)      cluster_score = 25;
else if (cluster_strength >= 3) cluster_score = 20;
else if (cluster_strength >= 2) cluster_score = 12;

// TRACK RECORD (0-20) — needs 10+ historical trades to activate
if (hit_rate >= 70) filer_track_record = 20;
else if (hit_rate >= 60) filer_track_record = 15;
else if (hit_rate >= 50) filer_track_record = 10;

The score is stored alongside the full breakdown as a JSONB column in Postgres:

CREATE TABLE signals (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  ticker TEXT NOT NULL,
  filer_name TEXT NOT NULL,
  filer_type TEXT NOT NULL,         -- 'congress' | 'corporate_insider'
  score INT CHECK (score BETWEEN 0 AND 100),
  score_breakdown JSONB NOT NULL,   -- { size_score, delay_score, ... }
  filters_passed TEXT[] NOT NULL,
  cluster_id UUID,
  filing_delay_days INT NOT NULL,
  -- ...
);

This means I can always explain why a trade scored the way it did — not just the final number.

Data Model: Three Core Tables

raw_trades — append-only ingest log. Every trade from every source lands here first, before filtering. Has a UNIQUE(source, source_id) constraint — upsert only, never blind insert.

signals — filtered and scored trades only. References raw_trades via FK. This is what the dashboard reads.

politicians — filer lookup with a computed hit_rate column (winning trades / total trades × 100, calculated in Postgres):

hit_rate NUMERIC GENERATED ALWAYS AS (
  CASE WHEN total_trades > 0
  THEN (winning_trades::NUMERIC / total_trades) * 100
  ELSE 0 END
) STORED

The Ingestion Pipeline

The whole flow lives in a single orchestrator function:

async function runIngestionPipeline() {
  // 1. Fetch
  const [congressTrades, insiderTrades, fmpTrades] = await Promise.all([
    quiverClient.fetchCongressTrades(7),
    quiverClient.fetchInsiderTrades(7),
    fmpClient.fetchInsiderTrades(7),
  ]);

  const rawTrades = [...congressTrades, ...insiderTrades, ...fmpTrades];

  // 2. Dedup + store
  await upsertRawTrades(rawTrades); // UNIQUE constraint handles dedup

  // 3. Filter (sync first, async only on survivors)
  const { passed, clusters, rejected } = await runFilterStack(rawTrades);

  // 4. Score
  const signals = await Promise.all(
    passed.map(trade => scoreWithContext(trade, clusters))
  );

  // 5. Store
  await upsertSignals(signals);

  // 6. Expire old signals
  await markStaleSignals(30); // is_active = false after 30 days

  console.log(`Pipeline: ${rawTrades.length} ingested → ${passed.length} signals`);
}

The cron endpoint is protected by a secret header — no auth library needed:

// /src/app/api/cron/route.ts
export async function GET(request: Request) {
  const secret = request.headers.get('x-cron-secret');
  if (secret !== process.env.CRON_SECRET) {
    return new Response('Unauthorized', { status: 401 });
  }

  await runIngestionPipeline();
  return new Response('OK');
}

What Makes This Different From Just Using Quiver

Quiver shows you raw data. Capitol Trades shows you raw data. Unusual Whales shows you raw data — with prettier charts.

Nobody gives you a confidence score. Nobody tells you "this specific combination of factors — a cluster of 3 insiders, fast filing, large size, from a senator on the Finance Committee — has historically been worth paying attention to."

The filter stack + scoring model is the IP. The data is commodity.

Phase Roadmap

Phase 1 (done): Personal tool. Use it for 4 weeks. Track accuracy manually.

Phase 2: Backtest engine — calculate 7/30/90-day returns for every historical signal. This turns the hit rate columns from placeholders into real data. Add Telegram alerts for score ≥ 70.

Phase 3: Multi-tenant SaaS via Supabase Auth + RLS. Pricing: free tier (3 signals/day, delayed) → Pro at $15/mo (real-time feed, full history, alerts, backtest). Break-even is literally 1 paying user — infrastructure cost at this scale is basically $10/mo for the Quiver API.

Phase 4: AI-generated trade thesis per signal. Cross-reference with FDA calendar, earnings, legislation schedule. The data is already there — it just needs context.

Competitive Moat

The barrier here isn't data access — it's the model. Quiver, Unusual Whales, and Capitol Trades all show you the same filings. The moat is:

The scoring model accumulates historical calibration over time (the hit_rate column)
Cluster detection catches coordinated buying that raw feeds miss
The filter stack eliminates noise that makes other tools feel overwhelming

Congressional trading platforms all have the same problem: too much signal, not enough filtering. Most users end up ignoring them after a few weeks because they don't know which trades to act on. A score from 0-100 solves that UX problem.

Key Technical Decisions

Why Quiver Quant over scraping? Capitol Trades has no API. You could scrape it with Apify, but Quiver gives you normalized data plus corporate insider trades, lobbying data, and WSB sentiment in one REST API. $10/mo vs. scraper maintenance is an easy call.

Why 7-day filing delay and not 45? This deliberately rejects ~80% of congressional trades. The fast-filers are a statistically distinct group. If I'm wrong about this hypothesis, the backtest data will tell me — and I can loosen the filter.

Why Cloudflare Workers for cron? Free tier covers 100K requests/day and unlimited scheduled workers. No Lambda cold starts. The entire infrastructure cost at personal-use scale is $10/mo (Quiver API only).

Why Supabase over plain Postgres? Row-Level Security means multi-tenant is a schema migration away, not an architectural rewrite. The free tier covers ~50K signals, which is years of personal use.

What's Next

Right now this is a personal tool — I use it for my own trading and I'm not ready to open it up yet. I want to run it for a few months, validate the scoring model against real returns, and see if the signals actually hold up before putting it in front of anyone else.

If the backtest data looks good, this becomes a product. The infrastructure is already designed for it — Supabase RLS for multi-tenancy, Cloudflare Pages for edge delivery, Paddle for payments. The jump from personal tool to SaaS is mostly a pricing page and an auth flow.

If you're building something similar or have thoughts on the filter logic, I'd love to hear it in the comments.

Built with Next.js 14, TypeScript strict mode, Supabase, Cloudflare Pages + Workers, and Quiver Quant API. Stack is fully open — the IP is in the filtering logic, not the framework choices.

Not financial advice. Congressional disclosure data is public record.

DEV Community