DEV Community

NexGenData
NexGenData

Posted on • Originally published at thenextgennexus.com

New: SEC Filings to Markdown for RAG — EDGAR filings as clean, citation-tagged chunks for your LLM

What it does

SEC Filings to Markdown converts EDGAR filings (10-K, 10-Q, 8-K, 13F) into clean, chunked Markdown built for retrieval-augmented generation. It resolves a ticker to the issuer, pulls filings live from official EDGAR, strips scripts and styles, converts the HTML to ATX Markdown, and splits each document into configurable word-sized chunks. Every chunk carries citation metadata — accession number and source URL — alongside issuer identity, form type, and filing date.

Who it's for

Teams building financial-research copilots and RAG systems over filings, quants and fundamental analysts loading 10-Ks into vector stores, compliance teams building searchable filing knowledge bases, and fintech products that need LLM-ready text with citations.

Sample fields / output

Field Description
company / cik / ticker Issuer identity
form Filing type (10-K, 10-Q, 8-K, 13F)
filingDate Date filed
accessionNumber SEC accession number (citation)
sourceUrl Direct link to the source document
chunkIndex / totalChunks Position within the filing
markdown Clean Markdown chunk, ready for embedding

Example use cases

  • Build a financial-research copilot or RAG system over SEC filings
  • Load 10-Ks and other filings into a vector store for fundamental analysis
  • Stand up a searchable filing knowledge base for a compliance team

▶ Run SEC Filings to Markdown for RAG on Apify

Related actors

FAQ

Is the source data official?

Yes — filings are pulled live from official SEC EDGAR, no login required.

Why chunked Markdown instead of raw HTML?

Chunks are sized for embedding and each one is paired with citation metadata, so your LLM can cite the exact filing.

What does it cost?

Pay-per-event: $0.005 per run plus $0.04 per Markdown chunk.

Top comments (0)