Setounkpe7

Posted on May 18

Sector-aware threat intel API: stop triaging hundreds of CVEs manually

#cybersecurity #osint #python #devops

A Monday morning in a SOC

A SOC (Security Operations Center) is the team that watches for cyberattacks against a company. They sit between the firewall and the incident report, and one of their daily jobs is to read the day's new software vulnerabilities and decide which ones the company has to patch first.

A vulnerability is published as a CVE (Common Vulnerabilities and Exposures), a kind of barcode the industry uses to refer to a single bug. "CVE-2025-12345" means one specific flaw, in one specific piece of software, at one specific moment in time.

Now picture a typical morning. Three public feeds land in the analyst's inbox:

NVD (National Vulnerability Database, run by the US government). The official catalogue. Around 150 new CVEs a day. Every flaw eventually shows up here, dangerous or not.
CISA KEV (Known Exploited Vulnerabilities). A shortlist of CVEs that attackers are already using in real attacks. Maintained by CISA, the US cybersecurity agency. Smaller, scarier.
GHSA (GitHub Security Advisories). Same idea, but for open-source code. A vulnerability in a Python library or a Node package shows up here.

None of these feeds is wrong. None is targeted either. A bank's SOC and a hospital's IT team subscribe to the same firehose, and a junior analyst spends the first two hours of every shift reading titles and asking the only question that really matters: "is this us?".

Is the bug in software we run? Does it touch a system we care about? Could an attacker use it against our customers? Most of the 150 daily CVEs are noise for any given company. A Minecraft mod RCE is irrelevant to a bank. A SCADA bug in a German PLC is irrelevant to a SaaS startup. But to find the five that matter, the analyst has to skim the headlines of the hundred-plus that don't.

That re-triage happens in every SOC, every morning, against the same feeds, with the rules living in someone's head instead of being written down anywhere. Different person on call, different answers. Different week, different priorities. No memory.

That triage step is the one I wanted to push down into an API.

The solution

threat-intel-api is an open-source (MIT) vulnerability intelligence service. It pulls in NVD, CISA KEV and GHSA, removes the duplicates between them, and scores every CVE against a YAML file you write that describes what your sector and your stack actually care about.

Six starter profiles ship in the box: finance, healthcare, ICS (industrial control systems, the software that runs factories and power grids), government, SaaS and e-commerce. Each profile is a list of keywords, technologies, CWE categories (a CWE is a type of bug, like SQL injection or hardcoded credentials) and weights you can tune.

Every score ships with the per-criterion breakdown that produced it. Paste it into a ticket and the analyst sees exactly why CVE-X scored 78 and CVE-Y scored 35. No hidden model, no opaque weight, no "trust the vendor".

The rest of this post is how to plug it into your stack.

What it does in 30 seconds

The live API runs at threat-intel-api-production.up.railway.app. No auth on the public endpoints, no key required. Three calls cover the demo:

Top 5 finance-relevant threats, last 24h

curl -s 'https://threat-intel-api-production.up.railway.app/api/v1/sectors/finance/dashboard' \
  | jq '.top_24h[:5] | .[] | {cve: .external_id, score, cvss: .cvss_score, title: .title[0:80]}'

{
  "cve": "CVE-2026-7579",
  "score": 35.0,
  "cvss": 7.3,
  "title": "Hard-coded credentials in AstrBot dashboard auth (CWE-798)"
}

Subscribe a SIEM to the finance RSS feed

curl 'https://threat-intel-api-production.up.railway.app/api/v1/sectors/finance/feed.rss?min_score=70'

Inspect the score breakdown of a single CVE

curl -s 'https://threat-intel-api-production.up.railway.app/api/v1/sectors/finance/dashboard' \
  | jq '.top_24h[0].score_breakdown'

{
  "cwe_match":        { "hit": true,  "matched": ["CWE-798"], "points": 20 },
  "cvss_threshold":   { "hit": true,  "threshold": 7.0,       "points": 15 },
  "kev":              { "hit": false, "points": 0 },
  "technology_match": { "hit": false, "matched": [], "points": 0 }
}

That CVE scored 35 for finance: +20 for the CWE-798 match (hardcoded credentials matter in banking), +15 because CVSS 7.3 clears the 7.0 threshold. Not a wake-up call, but it lands in the 7-day digest with a reason attached.

Swap finance for healthcare, ics, gov, saas, or ecommerce and you get a sector-weighted view of the same underlying CVE corpus.

Why a SOC team should care

This isn't a replacement for Recorded Future or Mandiant. It's the free, self-hosted thing you put next to them.

Sector-relevant by default. The Minecraft-mod RCEs that NVD publishes ten times a week don't surface in your finance feed. The excluded_keywords rule kills them at score time.
Auditable scoring. No tuned ML model, no opaque weights. Eleven additive rules, a flat function, the breakdown stored in Postgres next to the score. When the SOC lead asks why CVE-X is a 78 and CVE-Y is a 62, you can answer.
SIEM-ready out of the box. Every sector exposes a JSON dashboard and an RSS feed with a min_score filter. Splunk, Sentinel and Elastic all consume RSS natively.
Hot-reloadable profiles. Your team adds the keywords that match your stack without touching code or restarting the service.
No vendor lock-in. MIT, Docker, Postgres, one docker compose up.

Plug it into your stack in 5 minutes

SIEM via RSS

In Splunk, Sentinel, or Elastic: add an RSS input pointing at the feed. Done.

# Splunk-style: poll finance threats scoring >= 70
curl 'https://threat-intel-api-production.up.railway.app/api/v1/sectors/finance/feed.rss?min_score=70'

Tune min_score per sector. KEV-flagged threats get +25 automatically, so 70+ is usually the "actively-exploited or stack-relevant" band.

SOAR via JSON polling

A ~10-line Python poller for any SOAR runbook (Cortex XSOAR, Tines, Shuffle, n8n):

import httpx, time

SEEN = set()
URL = "https://threat-intel-api-production.up.railway.app/api/v1/sectors/finance/dashboard"

while True:
    data = httpx.get(URL, timeout=10).json()
    for t in data["top_24h"]:
        if t["score"] >= 75 and t["external_id"] not in SEEN:
            SEEN.add(t["external_id"])
            # hand off to your SOAR: create incident, enrich, notify
            print(f"NEW {t['external_id']} score={t['score']} cvss={t['cvss_score']}")
            print("  reasons:", [k for k, v in t["score_breakdown"].items() if v.get("hit")])
    time.sleep(900)  # 15 minutes — collectors run hourly

The breakdown gives the SOAR playbook a structured "why" field to include in the auto-created ticket.

Slack / Teams alert

Same poll, different sink:

import httpx
THREATS = httpx.get(".../sectors/ics/dashboard").json()["top_24h"]
for t in THREATS:
    if t["score"] >= 80:
        httpx.post(SLACK_WEBHOOK, json={
            "text": f":rotating_light: *{t['external_id']}* (score {t['score']}) — {t['title']}"
        })

Webhook alerting from the API itself is on the roadmap as M4, so you won't need the poller forever.

Add your sector profile

This is the part that earns the project its keep. Sector knowledge belongs to the people in the sector — not to me. A payments analyst can ship a better finance profile than I ever will.

Profiles are YAML. Here's the real profiles/public/finance.yaml:

id: finance
name: Finance & Banking
keywords: [banking, payment, swift, sepa, ach, card, pos, atm]
technologies: [Java, Spring, Tomcat, WebSphere, Oracle Database, Apache Struts]
cwe_priorities: [CWE-89, CWE-79, CWE-352, CWE-287, CWE-798, CWE-22]
cvss_threshold: 7.0
priority_boost_keywords: [wire transfer, swift network, card skimming]
excluded_keywords: [minecraft, game, mod]

Want iso20022 weighted as a priority boost? Want mitre-att&ck-t1486 in the keyword set? Edit one file, then:

curl -X POST http://localhost:8000/api/v1/admin/reload-profiles \
     -H "X-Admin-Key: $ADMIN_API_KEY"

No restart. No migration. The reload response lists profiles added, updated, removed, and any parse errors with file paths. Private profiles live in profiles/private/ and require the admin header to read — useful if your internal product list is sensitive.

Full schema reference: profiles/README.md.

The scoring algorithm in one table

Eleven rules. Additive. Clamped to [0, 100]. Whole-word, case-insensitive matching, so java doesn't match javascript.

Rule	Condition	Points
Technology match	profile `technology` appears in corpus	+30
Keyword match	profile `keyword` appears in corpus	+25
CWE match	threat CWE listed in `cwe_priorities`	+20
CVSS threshold	`cvss_score >= cvss_threshold`	+15
Priority boost	profile `priority_boost_keyword` appears	+20
Excluded	profile `excluded_keyword` appears	-30
KEV tag	threat tagged `kev`	+25
Actively exploited	threat tagged `actively-exploited`	+15
Ransomware	threat tagged `ransomware`	+15
Multi-source	documented by 3+ sources (or +5 for 2)	+10
Package match	`package` indicator matches a `technology`	+20

Every score the API returns ships with the per-criterion breakdown that produced it. Drop it straight into a ticket and the analyst can defend the priority to the SOC lead without re-running anything.

Architecture, briefly

Three collectors → one dedup'd Threat table → a flat scoring function → JSON + RSS.

One Threat row per CVE ID, one ThreatSource row per (CVE, source) pair. NVD wins on CVSS at read time, KEV wins on the actively-exploited and ransomware tags, GHSA wins on package indicators. Provenance stays attached. You can always check what NVD said vs what KEV said about the same CVE.

Stack: FastAPI, SQLAlchemy 2.0, PostgreSQL, Pydantic v2, httpx + tenacity, structlog, Sentry. Runs in a 195 MB Alpine container, non-root, no pip in the runtime image, signed with cosign. CI gate is 9 blocking jobs: Bandit, Semgrep, Ruff (security ruleset), mypy strict, pip-audit, Hadolint, Trivy (fails on CRITICAL+HIGH > 0), SBOM publish, full pytest with 86% coverage threshold.

What it won't do (yet)

Honesty section:

Not a TIP replacement. No actor or campaign attribution, no malware-family clustering, no IOC enrichment beyond what the source feeds publish.
No STIX 2.1 / TAXII export yet. That's M5 on the roadmap — needed for enterprise TIP ingestion.
GHSA collector is in soft-fail mode in production. It runs, but GraphQL rate-limiting still bites occasionally and I haven't fully stabilised the cursor pagination. NVD and KEV are the reliable spine right now.
No webhook alerting yet. Poll the dashboard endpoint, or subscribe to the RSS feed. M4 will fix this.

If any of those are blockers for your team, the issues board is the right place to say so — priority follows usage.

Try it / contribute

Where help would actually matter:

Sector profiles. A payments analyst can ship a better finance.yaml than I can. Same for a clinical-systems engineer on healthcare, or an OT engineer on ICS. Open a PR with a YAML and a one-paragraph rationale.
GHSA stabilisation. GraphQL pagination and rate-limit handling in collectors/ghsa.py.
M4 webhook alerting. The shape is sketched in the roadmap. Looking for someone who's done webhook delivery at scale (retries, signing, dead-letter handling).

Run it locally:

git clone https://github.com/Setounkpe7/threat-intel-api.git
cd threat-intel-api
docker compose up
# API on :8000, Swagger at /docs

DEV Community