I Open-Sourced the SEC 8-K Classifier Powering My Forensic Tool — Here's the Taxonomy
Most of what makes a SEC 8-K interesting is what's not in the item list. Corporate filers routinely bury Item 1.03 bankruptcy-adjacent language inside an Item 8.01 "Regulation FD" disclosure, or slide an Item 3.01 delisting notice into an otherwise-routine investor update. The item-list header is a filter, not the truth.
I built a classifier that parses the body text of every 8-K and reclassifies mismatches — outputting a buried_json field for every filing that shows what items the body actually references vs. what items the filer marked. It's the engine behind the FilingFirehose forensic scoring tool, and I open-sourced it because the algorithm should be auditable.
Repo: github.com/jaablon/buried-events-parser
MIT-licensed. Pure Python. Zero LLM calls in the classifier itself (the taxonomy runs deterministically; LLMs only enter later in the pipeline for narrative synthesis).
The taxonomy
There are 25 numbered items in a 8-K filing (1.01 through 9.01). The classifier maintains a per-item lexicon of trigger phrases and sub-item patterns.
Some illustrative examples:
Item 1.03 (bankruptcy / receivership):
- Direct triggers: "voluntary petition", "chapter 11", "chapter 7", "receiver appointed", "trustee appointed"
- Contextual triggers: "restructuring support agreement", "DIP financing", "plan of reorganization"
Item 5.02 (officer / director events):
- Direct triggers: "resignation of", "termination of", "appointment of", specific role words paired with departure verbs
- Cluster detection: multiple 5.02s within a 30-day window flag "departure cluster" separately
Item 8.01 (other events):
- The catchall. This is where filers hide anti-signals from search-crawlers. The classifier is most aggressive here — any 1.03/2.04/3.01/5.02 language in an 8.01 body gets tagged as
buried_1_03,buried_2_04, etc.
Item 3.01 (listing standard / transfer notice):
- Direct triggers: "notification of failure", "continued listing", "listing standard", "grace period", "delisting"
- Presence of a standalone 3.01 filing (without other items) is itself a high-signal event — most companies file 3.01s alongside placeholder items to reduce visibility.
The interesting failure modes
Two edge cases you learn about the hard way:
Amendments (10-K/A, 10-Q/A): These are technically not 8-Ks but they carry Item 4.02 restatement disclosures. The classifier handles them by walking the amendment cover page for the "restated" keyword pattern before falling through to the body scan.
Foreign private issuers (20-F, 6-K): They don't use 8-K item codes at all — they follow SEC form-based disclosure. The classifier translates 6-K materiality flags into the closest 8-K item equivalent so the scoring pipeline downstream doesn't need to know about the alternate form.
Why open source
Every claim in the paid $49 forensic report is tied to a specific EDGAR accession number and a specific classifier output. Anyone who reads the report — customer, journalist, opposing counsel, whoever — can pull the source filing themselves and run it through the classifier locally. Zero trust required in the LLM narrative layer.
That's the trade I want to make explicit: the scoring is auditable by anyone. What you pay $49 for is the narrative synthesis, the executive summary, the accession-cited red-flag catalog, and the "what changes next" section. You're paying for the reading, not the counting.
The next thing I want to build
Cross-filing correlation. Right now the classifier operates on one filing at a time. But the most-interesting SEC disclosure risks are patterns across filings — a 5.02 cluster, an S-3 followed by a 424B5, a 10-K/A restatement following an auditor 5.02. The next release will emit a per-issuer time series so downstream tools can detect these patterns programmatically.
If you're building anything in the SEC-data or fintech space and want to plug this into your pipeline, the code is above. If you want the tool that consumes it end-to-end, FilingFirehose Forensic is free to try — paste any US ticker, get a risk score in 3 seconds, then optionally buy the $49 full report on the same ticker.
Feedback on the taxonomy welcome. What SEC filing signals do you key on that I should add?
Nothing here is investment advice. The classifier surfaces disclosure-language patterns; deciding what to do with those signals is your job, not the tool's.
Top comments (0)