TL;DR. A serious SEC-driven research desk pulls at least five EDGAR feeds in parallel: 8-K material events, Form 4 insider transactions, Schedule 13F institutional holdings, Form D private placements, and SEC enforcement releases. Each has its own schema, latency profile, and parsing quirks. This guide covers the unified event-router pattern — the schema, the dedupe strategy, and a working Python implementation that pulls all five into a single normalized event stream. Useful for L/S hedge funds, event-driven research desks, and family offices building in-house surveillance.
Why a unified event stream
The five feeds individually each tell you something. Together they let you ask cross-feed questions that are otherwise hard to answer:
- "Which 8-K item-5.02 CEO departures preceded an insider Form 4 sale by the same person?"
- "Which Form D private placements happened within 60 days of a related-party 8-K Item 1.01?"
- "Which 13F filers added to a position the same quarter the company filed an Item 4.02 restatement?"
- "Which SEC enforcement defendants were also recent Form D issuers?"
Each question is a join across two feeds. A unified event-router architecture makes those joins trivial. Run as five independent silos, they're expensive and slow.
The architecture pattern
The router has three layers:
- Ingestion — per-feed pullers (8-K, Form 4, 13F, Form D, Litigation Releases). Each runs on its own schedule appropriate to feed latency: 8-K every 5 minutes during market hours, Form 4 every 15 minutes, 13F daily, Form D hourly, Litigation Releases hourly.
-
Normalization — a common event envelope:
{event_id, source, timestamp, cik, ticker, event_type, payload}. Thepayloadstays feed-specific; the envelope is uniform. - Routing / fan-out — events get tagged with categories (insider-sell, M&A-rumor, enforcement-hit, etc.) and pushed to subscribers (Slack channels, position-monitor jobs, alpha-research notebooks).
The unified event envelope
{
"event_id": "sha256-of-source-id",
"source": "8K" | "FORM4" | "13F" | "FORMD" | "LR",
"timestamp": "2026-05-15T16:32:00Z", # accepted-by-EDGAR time, UTC
"cik": "0000320193",
"ticker": "AAPL", # nullable for private issuers
"event_type": "ITEM_1_05" | "INSIDER_SELL" | ...,
"payload": { /* feed-specific raw fields */ }
}
Dedupe key: (source, source_specific_id). For 8-K it's the accession number, for Form 4 it's the accession number + transaction line, for 13F it's the filer CIK + period-of-report, etc.
Working Python — all five feeds, one envelope
Each Apify actor returns its native schema; the wrapper normalizes into the unified envelope. The actors used:
- SEC EDGAR 8-K Filings
- SEC Form 4 Insider Trading
- SEC Form 13F Holdings
- SEC Form D
- SEC Litigation Releases
Curl pattern (one example feed):
curl -X POST "https://api.apify.com/v2/acts/nexgendata~sec-edgar-8k-filings/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"since": "2026-05-29", "maxResults": 500}'
Python — fan out to all five and normalize:
import os, json, hashlib, urllib.request
from concurrent.futures import ThreadPoolExecutor
APIFY_TOKEN = os.environ["APIFY_TOKEN"]
SINCE = "2026-05-29"
FEEDS = {
"8K": ("nexgendata~sec-edgar-8k-filings", {"since": SINCE}),
"FORM4": ("nexgendata~sec-form-4-insider-trading-scraper", {"since": SINCE}),
"13F": ("nexgendata~sec-form-13f-tracker-pro", {"since": SINCE}),
"FORMD": ("nexgendata~sec-form-d-scraper", {"since": SINCE}),
"LR": ("nexgendata~sec-litigation-releases", {"since": SINCE}),
}
def pull(actor, payload):
url = f"https://api.apify.com/v2/acts/{actor}/run-sync-get-dataset-items?token={APIFY_TOKEN}"
req = urllib.request.Request(url, data=json.dumps(payload).encode(),
method="POST",
headers={"Content-Type": "application/json"})
with urllib.request.urlopen(req, timeout=900) as r:
return json.loads(r.read())
def envelope(source, item):
if source == "8K":
sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
item["cik"], item.get("ticker"),
"ITEM_" + (item["items"][0] if item.get("items") else "0"))
elif source == "FORM4":
sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
item["issuerCik"], item.get("ticker"),
"INSIDER_" + item["transactionType"])
elif source == "13F":
sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
item["filerCik"], None, "13F_FILED")
elif source == "FORMD":
sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
item["cik"], None, "PRIVATE_PLACEMENT")
else: # LR
sid, ts, cik, tic, et = (item["releaseNumber"], item["releaseDate"],
None, None, "ENFORCEMENT")
return {
"event_id": hashlib.sha256(f"{source}:{sid}".encode()).hexdigest()[:16],
"source": source, "timestamp": ts, "cik": cik, "ticker": tic,
"event_type": et, "payload": item,
}
with ThreadPoolExecutor(max_workers=5) as ex:
futures = {src: ex.submit(pull, actor, payload)
for src, (actor, payload) in FEEDS.items()}
events = []
for src, fut in futures.items():
for item in fut.result():
events.append(envelope(src, item))
events.sort(key=lambda e: e["timestamp"], reverse=True)
print(f"Total events: {len(events)}")
for e in events[:20]:
print(f"{e['timestamp']} | {e['source']:5s} | {e['event_type']:25s} | CIK {e['cik']}")
Latency and cost compared to terminals
| Stack | Latency | Coverage | Annual cost (5 seats) |
|---|---|---|---|
| Bloomberg Terminal | Sub-second across all feeds | Full + non-SEC enrichment | ~$120,000 |
| Refinitiv Eikon | Sub-second | Full | ~$110,000 |
| S&P Global Capital IQ Pro | Real-time | Full | ~$80,000–$150,000 (negotiated) |
| FactSet Filings | Real-time | SEC-focused | ~$60,000+ |
| DIY (EDGAR + parsers) | 1–5 min from EDGAR | Whatever you build | Free + eng time |
| Apify actor fleet (above) | 2–10 min from EDGAR | Five feeds, unified | PPE — typically < $200/mo per active feed |
For event-driven research desks that are not running HFT, the 2–10 minute latency is not the bottleneck — the analyst's response time is. For HFT-style strategies, you need exchange-co-located feeds and direct EDGAR sub-second polling regardless.
Cross-feed queries to try first
- Insider sell after 8-K Item 5.02 — exec departure followed by a Form 4 sale by the same person within 30 days. Strong tell on internal-knowledge timing.
- Form D close before 8-K Item 1.01 — private placement closing followed by a material definitive agreement. Catches financing-tied M&A.
- 13F new-position + 8-K Item 2.02 surprise — large institutional add the quarter before a positive earnings surprise.
- Litigation release defendant CIK match — SEC charges an entity that is also a recent Form D issuer or 8-K filer.
Operational notes
A few things that bite when you first run this in production:
- 13F is quarterly, not real-time — filings come in waves around the 45-day post-quarter-end deadline. Most arrive in the final week. Your "real-time" event stream will spike on those days. Plan storage and downstream consumer rate limits accordingly.
- Form 4 has a 2-day filing window — fastest of the SEC forms by deadline. The flow during US market hours is steady and quite high — thousands of transactions per day across all issuers.
- CIK leading-zero canonicalization — EDGAR sometimes returns CIKs as "320193" and sometimes as "0000320193." Always normalize to the 10-digit zero-padded form before any join.
- Time zone hygiene — EDGAR returns timestamps in Eastern Time. Convert to UTC for storage, render to user's local timezone for display.
Storage and retention
A full year of 8-K + Form 4 + Form D + 13F + LR raw payloads runs to ~30–50 GB depending on whether you store full filing bodies. Cheap on modern object storage but you want a clear retention policy from day one — most downstream queries only need the last 24 months hot, with older data archived.
Related
Per-feed deep dives: SEC 8-K guide, enforcement / litigation guide, Form D guide. For non-SEC complements: press-release event signals, ticker extraction from release text. For non-US analogues: ASX announcements and SGX announcements.
Get started: The five actors above are all on the NexGenData Apify catalog. Pull a 24-hour slice across all five and run the cross-feed queries — the joins write themselves once the envelopes are normalized.
Related Reading
More from this series:
- SEC 8-K Filings API: Build a Material Events Tracker (2026 Guide)
- SEC Enforcement Actions API: Build a Compliance Watchlist
- Track Private Placements with SEC Form D: API Guide for VCs and M&A; Analysts
- How to Vet a Stockbroker with FINRA BrokerCheck: Free API Guide
From the press release / event-driven series:
Top comments (0)