DEV Community

NexGenData
NexGenData

Posted on • Originally published at thenextgennexus.com

SEC Event Router: One API for 8-K, Litigation, Form 4, Form D, and 13F

TL;DR. A serious SEC-driven research desk pulls at least five EDGAR feeds in parallel: 8-K material events, Form 4 insider transactions, Schedule 13F institutional holdings, Form D private placements, and SEC enforcement releases. Each has its own schema, latency profile, and parsing quirks. This guide covers the unified event-router pattern — the schema, the dedupe strategy, and a working Python implementation that pulls all five into a single normalized event stream. Useful for L/S hedge funds, event-driven research desks, and family offices building in-house surveillance.

Why a unified event stream

The five feeds individually each tell you something. Together they let you ask cross-feed questions that are otherwise hard to answer:

  • "Which 8-K item-5.02 CEO departures preceded an insider Form 4 sale by the same person?"
  • "Which Form D private placements happened within 60 days of a related-party 8-K Item 1.01?"
  • "Which 13F filers added to a position the same quarter the company filed an Item 4.02 restatement?"
  • "Which SEC enforcement defendants were also recent Form D issuers?"

Each question is a join across two feeds. A unified event-router architecture makes those joins trivial. Run as five independent silos, they're expensive and slow.

The architecture pattern

The router has three layers:

  1. Ingestion — per-feed pullers (8-K, Form 4, 13F, Form D, Litigation Releases). Each runs on its own schedule appropriate to feed latency: 8-K every 5 minutes during market hours, Form 4 every 15 minutes, 13F daily, Form D hourly, Litigation Releases hourly.
  2. Normalization — a common event envelope: {event_id, source, timestamp, cik, ticker, event_type, payload}. The payload stays feed-specific; the envelope is uniform.
  3. Routing / fan-out — events get tagged with categories (insider-sell, M&A-rumor, enforcement-hit, etc.) and pushed to subscribers (Slack channels, position-monitor jobs, alpha-research notebooks).

The unified event envelope


    {
      "event_id": "sha256-of-source-id",
      "source": "8K" | "FORM4" | "13F" | "FORMD" | "LR",
      "timestamp": "2026-05-15T16:32:00Z",  # accepted-by-EDGAR time, UTC
      "cik": "0000320193",
      "ticker": "AAPL",                     # nullable for private issuers
      "event_type": "ITEM_1_05" | "INSIDER_SELL" | ...,
      "payload": { /* feed-specific raw fields */ }
    }

Enter fullscreen mode Exit fullscreen mode

Dedupe key: (source, source_specific_id). For 8-K it's the accession number, for Form 4 it's the accession number + transaction line, for 13F it's the filer CIK + period-of-report, etc.

Working Python — all five feeds, one envelope

Each Apify actor returns its native schema; the wrapper normalizes into the unified envelope. The actors used:

Curl pattern (one example feed):


    curl -X POST "https://api.apify.com/v2/acts/nexgendata~sec-edgar-8k-filings/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{"since": "2026-05-29", "maxResults": 500}'

Enter fullscreen mode Exit fullscreen mode

Python — fan out to all five and normalize:


    import os, json, hashlib, urllib.request
    from concurrent.futures import ThreadPoolExecutor

    APIFY_TOKEN = os.environ["APIFY_TOKEN"]
    SINCE = "2026-05-29"

    FEEDS = {
        "8K":    ("nexgendata~sec-edgar-8k-filings",            {"since": SINCE}),
        "FORM4": ("nexgendata~sec-form-4-insider-trading-scraper", {"since": SINCE}),
        "13F":   ("nexgendata~sec-form-13f-tracker-pro",         {"since": SINCE}),
        "FORMD": ("nexgendata~sec-form-d-scraper",               {"since": SINCE}),
        "LR":    ("nexgendata~sec-litigation-releases",          {"since": SINCE}),
    }

    def pull(actor, payload):
        url = f"https://api.apify.com/v2/acts/{actor}/run-sync-get-dataset-items?token={APIFY_TOKEN}"
        req = urllib.request.Request(url, data=json.dumps(payload).encode(),
                                      method="POST",
                                      headers={"Content-Type": "application/json"})
        with urllib.request.urlopen(req, timeout=900) as r:
            return json.loads(r.read())

    def envelope(source, item):
        if source == "8K":
            sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
                                      item["cik"], item.get("ticker"),
                                      "ITEM_" + (item["items"][0] if item.get("items") else "0"))
        elif source == "FORM4":
            sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
                                      item["issuerCik"], item.get("ticker"),
                                      "INSIDER_" + item["transactionType"])
        elif source == "13F":
            sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
                                      item["filerCik"], None, "13F_FILED")
        elif source == "FORMD":
            sid, ts, cik, tic, et = (item["accessionNumber"], item["filedAt"],
                                      item["cik"], None, "PRIVATE_PLACEMENT")
        else:  # LR
            sid, ts, cik, tic, et = (item["releaseNumber"], item["releaseDate"],
                                      None, None, "ENFORCEMENT")
        return {
            "event_id": hashlib.sha256(f"{source}:{sid}".encode()).hexdigest()[:16],
            "source": source, "timestamp": ts, "cik": cik, "ticker": tic,
            "event_type": et, "payload": item,
        }

    with ThreadPoolExecutor(max_workers=5) as ex:
        futures = {src: ex.submit(pull, actor, payload)
                    for src, (actor, payload) in FEEDS.items()}
        events = []
        for src, fut in futures.items():
            for item in fut.result():
                events.append(envelope(src, item))

    events.sort(key=lambda e: e["timestamp"], reverse=True)
    print(f"Total events: {len(events)}")
    for e in events[:20]:
        print(f"{e['timestamp']} | {e['source']:5s} | {e['event_type']:25s} | CIK {e['cik']}")

Enter fullscreen mode Exit fullscreen mode

Latency and cost compared to terminals

Stack Latency Coverage Annual cost (5 seats)
Bloomberg Terminal Sub-second across all feeds Full + non-SEC enrichment ~$120,000
Refinitiv Eikon Sub-second Full ~$110,000
S&P Global Capital IQ Pro Real-time Full ~$80,000–$150,000 (negotiated)
FactSet Filings Real-time SEC-focused ~$60,000+
DIY (EDGAR + parsers) 1–5 min from EDGAR Whatever you build Free + eng time
Apify actor fleet (above) 2–10 min from EDGAR Five feeds, unified PPE — typically < $200/mo per active feed

For event-driven research desks that are not running HFT, the 2–10 minute latency is not the bottleneck — the analyst's response time is. For HFT-style strategies, you need exchange-co-located feeds and direct EDGAR sub-second polling regardless.

Cross-feed queries to try first

  1. Insider sell after 8-K Item 5.02 — exec departure followed by a Form 4 sale by the same person within 30 days. Strong tell on internal-knowledge timing.
  2. Form D close before 8-K Item 1.01 — private placement closing followed by a material definitive agreement. Catches financing-tied M&A.
  3. 13F new-position + 8-K Item 2.02 surprise — large institutional add the quarter before a positive earnings surprise.
  4. Litigation release defendant CIK match — SEC charges an entity that is also a recent Form D issuer or 8-K filer.

Operational notes

A few things that bite when you first run this in production:

  • 13F is quarterly, not real-time — filings come in waves around the 45-day post-quarter-end deadline. Most arrive in the final week. Your "real-time" event stream will spike on those days. Plan storage and downstream consumer rate limits accordingly.
  • Form 4 has a 2-day filing window — fastest of the SEC forms by deadline. The flow during US market hours is steady and quite high — thousands of transactions per day across all issuers.
  • CIK leading-zero canonicalization — EDGAR sometimes returns CIKs as "320193" and sometimes as "0000320193." Always normalize to the 10-digit zero-padded form before any join.
  • Time zone hygiene — EDGAR returns timestamps in Eastern Time. Convert to UTC for storage, render to user's local timezone for display.

Storage and retention

A full year of 8-K + Form 4 + Form D + 13F + LR raw payloads runs to ~30–50 GB depending on whether you store full filing bodies. Cheap on modern object storage but you want a clear retention policy from day one — most downstream queries only need the last 24 months hot, with older data archived.

Related

Per-feed deep dives: SEC 8-K guide, enforcement / litigation guide, Form D guide. For non-SEC complements: press-release event signals, ticker extraction from release text. For non-US analogues: ASX announcements and SGX announcements.

Get started: The five actors above are all on the NexGenData Apify catalog. Pull a 24-hour slice across all five and run the cross-feed queries — the joins write themselves once the envelopes are normalized.

Related Reading

More from this series:

From the press release / event-driven series:

Top comments (0)