<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jethro Kirk</title>
    <description>The latest articles on DEV Community by Jethro Kirk (@jethro_kirk_94cce29b139fa).</description>
    <link>https://dev.to/jethro_kirk_94cce29b139fa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4003688%2Fc1cca1ff-b2ae-4717-9b5d-02e942ff502d.png</url>
      <title>DEV Community: Jethro Kirk</title>
      <link>https://dev.to/jethro_kirk_94cce29b139fa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jethro_kirk_94cce29b139fa"/>
    <language>en</language>
    <item>
      <title>How I Built a Financial Data API Using SEC EDGAR and Python in 2 Days with no prior experience.</title>
      <dc:creator>Jethro Kirk</dc:creator>
      <pubDate>Fri, 26 Jun 2026 09:26:07 +0000</pubDate>
      <link>https://dev.to/jethro_kirk_94cce29b139fa/how-i-built-a-financial-data-api-using-sec-edgar-and-python-in-2-days-with-no-prior-experience-1n29</link>
      <guid>https://dev.to/jethro_kirk_94cce29b139fa/how-i-built-a-financial-data-api-using-sec-edgar-and-python-in-2-days-with-no-prior-experience-1n29</guid>
      <description>&lt;p&gt;I first became aware of a gap in the market a couple weeks ago while scrolling Reddit. Hoardes of people were enquiring after structured earnings transcript data without paying $149/mo from the organisations that provide it, those organisations also didn't offer any historical data older than 2022.&lt;/p&gt;

&lt;p&gt;So I sat with Opus and began to study how such tool are structured and built - two days later I have the finished product (filingapi.dev).&lt;/p&gt;

&lt;p&gt;Here's how I built it:&lt;br&gt;
Disclosure: this is my own project.&lt;br&gt;
The Problem&lt;br&gt;
Financial transcript data is locked behind expensive paywalls. Financial Modeling Prep charges $149/mo for transcript access. Bloomberg requires enterprise contracts. The few free tools on Apify are broken or abandoned.&lt;br&gt;
Meanwhile, SEC EDGAR is completely free, has no rate limits worth worrying about, and contains every 8-K filing from every public company. The data is there — it just needs parsing.&lt;br&gt;
Architecture&lt;br&gt;
SEC EDGAR → Scraper → Parser → SQLite → FastAPI → Customer&lt;br&gt;
The stack is simple:&lt;/p&gt;

&lt;p&gt;FastAPI for the API layer&lt;br&gt;
SQLite for storage (good enough for MVP)&lt;br&gt;
httpx for async HTTP requests&lt;br&gt;
BeautifulSoup for HTML parsing&lt;br&gt;
Stripe for billing&lt;/p&gt;

&lt;p&gt;Step 1: EDGAR CIK Resolver&lt;br&gt;
Every company on EDGAR has a CIK number. SEC publishes a JSON file mapping tickers to CIKs:&lt;br&gt;
pythondef get_cik_by_ticker(self, ticker: str) -&amp;gt; str:&lt;br&gt;
    resp = self.client.get(&lt;br&gt;
        "&lt;a href="https://www.sec.gov/files/company_tickers.json" rel="noopener noreferrer"&gt;https://www.sec.gov/files/company_tickers.json&lt;/a&gt;"&lt;br&gt;
    )&lt;br&gt;
    data = resp.json()&lt;br&gt;
    for entry in data.values():&lt;br&gt;
        if entry["ticker"].upper() == ticker.upper():&lt;br&gt;
            return str(entry["cik_str"]).zfill(10)&lt;br&gt;
Step 2: Pulling 8-K Filings&lt;br&gt;
With the CIK, you can pull every filing from the submissions API:&lt;br&gt;
pythonresp = self.client.get(&lt;br&gt;
    f"&lt;a href="https://data.sec.gov/submissions/CIK%7Bcik%7D.json" rel="noopener noreferrer"&gt;https://data.sec.gov/submissions/CIK{cik}.json&lt;/a&gt;"&lt;br&gt;
)&lt;br&gt;
filings = resp.json()["filings"]["recent"]&lt;br&gt;
Filter for form == "8-K" and you have every material event filing.&lt;br&gt;
Step 3: The Real Value — Exhibit 99.1&lt;br&gt;
The 8-K itself is just a notice. The actual earnings press release is in Exhibit 99.1, attached to the filing. I built a fetcher that hits the filing index page, finds the 99.1 link, and downloads the full press release.&lt;br&gt;
This is where the guidance language lives: "revenue of $111.2 billion", "raised guidance", "record earnings per share".&lt;br&gt;
Step 4: Guidance Language Extraction&lt;br&gt;
I scan every sentence in the exhibit for forward-looking keywords:&lt;br&gt;
pythonGUIDANCE_KEYWORDS = [&lt;br&gt;
    "guidance", "outlook", "forecast", "expects",&lt;br&gt;
    "revenue of", "earnings per share", "record revenue",&lt;br&gt;
    "raised guidance", "lowered guidance", "margin pressure",&lt;br&gt;
]&lt;/p&gt;

&lt;p&gt;def extract_guidance(self, text: str) -&amp;gt; list[dict]:&lt;br&gt;
    sentences = re.split(r'(?&amp;lt;=[.!?])\s+', text)&lt;br&gt;
    results = []&lt;br&gt;
    for sent in sentences:&lt;br&gt;
        matched = [kw for kw in GUIDANCE_KEYWORDS &lt;br&gt;
                   if kw in sent.lower()]&lt;br&gt;
        if matched and len(sent) &amp;gt; 30:&lt;br&gt;
            results.append({"text": sent, "keywords": matched})&lt;br&gt;
    return results&lt;br&gt;
Step 5: Earnings Call Transcripts&lt;br&gt;
For transcripts, I scrape Motley Fool's free archive. Each transcript page has a consistent structure — &lt;/p&gt;
&lt;p&gt; tags with &lt;strong&gt; speaker names. The scraper identifies speakers, separates prepared remarks from Q&amp;amp;A, and tags roles (CEO, CFO, Analyst).&lt;br&gt;
The key discovery: the container with the transcript body is always the &lt;/strong&gt;&lt;/p&gt;
&lt;strong&gt; with the most direct &lt;p&gt; children. Not the &lt;/p&gt; tag — that's a related content card.&lt;br&gt;
Step 6: On-Demand Fetching&lt;br&gt;
Instead of pre-scraping every company, I built an on-demand system. Client requests a ticker → check database → miss → scrape → store → return. Next request is instant from cache.&lt;br&gt;
This means the API covers any US-listed company without maintaining a massive scraping pipeline.&lt;br&gt;
What It Returns&lt;br&gt;
A real request for NVDA guidance:&lt;br&gt;
bashcurl -H "X-API-Key: YOUR_KEY" \&lt;br&gt;
  &lt;a href="https://filingapi.dev/v1/filings/NVDA/guidance" rel="noopener noreferrer"&gt;https://filingapi.dev/v1/filings/NVDA/guidance&lt;/a&gt;&lt;br&gt;
Returns every forward-looking sentence from NVIDIA's press releases, with keywords flagged. No manual reading required.&lt;br&gt;
Deployment

&lt;p&gt;VPS: $6/mo Vultr instance (1 vCPU, 1GB RAM)&lt;br&gt;
Nginx reverse proxy&lt;br&gt;
Let's Encrypt for HTTPS&lt;br&gt;
Systemd service that survives reboots&lt;br&gt;
Cron job running daily ingestion at 6am UTC&lt;/p&gt;

&lt;p&gt;Total infrastructure cost: $6/mo + $10/yr for the domain.&lt;br&gt;
What I Learned&lt;/p&gt;

&lt;p&gt;EDGAR is underrated. Zero anti-bot measures, free, comprehensive. Most fintech startups charge hundreds for data that's sitting there for free.&lt;br&gt;
Ship the differentiator first. Transcript APIs already exist. Guidance language extraction from 8-K exhibits doesn't. Lead with what's unique.&lt;br&gt;
On-demand beats batch. Caching on first request is better than trying to pre-scrape 8,000 companies before launch.&lt;/p&gt;

&lt;p&gt;Try It&lt;br&gt;
The API is live at filingapi.dev with a free tier (50 requests/day, instant signup). Currently covering ~460 tickers with 1,400+ parsed filings.&lt;br&gt;
Docs: filingapi.dev/docs&lt;br&gt;
Happy to answer questions about the build or take feature requests.&lt;/p&gt;

&lt;/strong&gt;

</description>
      <category>python</category>
      <category>api</category>
      <category>fastapi</category>
      <category>stocks</category>
    </item>
  </channel>
</rss>
