LLMs for Threat Intelligence: Applications, Tools, and Where to Learn

If you want to apply large language models to threat intelligence, the question is not whether they help but where they belong in the pipeline. LLMs are good at the language-heavy parts of cyber threat intelligence (CTI): reading prose reports, extracting structure, mapping narratives to techniques, and drafting summaries. They are bad at the parts that have to be exactly right: indicators, attribution, and anything that drives an automatic block. Build around that split and LLMs remove real analyst toil.

Here is what works, the tools to use, and where the practical skills come from.

What LLMs Are Actually Good At in CTI

Most CTI work is reading. An analyst ingests vendor reports, OSINT, pastes, and feeds, then turns that unstructured text into structured indicators and tactics, techniques, and procedures (TTPs). That is a language task with abundant training data, which is exactly where LLMs perform.

The reliable uses:

Summarizing long reports into a few sentences an analyst can triage in seconds.
Extracting the narrative: who did what, in what order, against whom.
Mapping prose to MITRE ATT&CK technique IDs with supporting evidence.
Normalizing and deduplicating indicators already pulled by a deterministic pass.

Notice what is missing: pulling indicators from raw text by themselves. That is the one thing you should not trust the model to do alone.

Extract Indicators Deterministically, Then Let the Model Add Context

The instinct to ask the model "list every IOC in this report" is the most common way this goes wrong. The model will occasionally transpose a digit in an IP, drop a character from a hash, or mishandle a defanged domain like evil[.]com. In CTI a single wrong character is not a typo, it is a bad blocklist entry.

Do the extraction with a regex-based pass first. msticpy and iocextract both pull and refang indicators with tested patterns:

from msticpy.transform import IoCExtract

extractor = IoCExtract()
iocs = extractor.extract(report_text)   # ipv4, sha256, domains, urls, etc.

Then hand the model the deterministically-extracted indicators plus the report text, and ask it to do the language work: which indicators are the actual payload versus incidental, what they relate to, and how to deduplicate them against what you already have.

Map Reports to ATT&CK With Forced Structured Output

The high-value LLM step in CTI is turning a prose report into ATT&CK technique IDs you can pivot on. Force structured output so the result drops straight into your platform. The Anthropic Messages API supports tool use, which doubles as a schema enforcer:

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

attck_tool = {
    "name": "record_ttps",
    "description": "Record the MITRE ATT&CK techniques described in a threat report.",
    "input_schema": {
        "type": "object",
        "properties": {
            "techniques": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "technique_id": {"type": "string"},      # e.g. T1566.001
                        "evidence": {"type": "string"},          # quote from the report
                    },
                    "required": ["technique_id", "evidence"],
                },
            }
        },
        "required": ["techniques"],
    },
}

resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=[attck_tool],
    tool_choice={"type": "tool", "name": "record_ttps"},
    system=(
        "You map threat reports to MITRE ATT&CK. For each technique, quote the exact "
        "sentence that supports the mapping. Do not assign a technique you cannot quote."
    ),
    messages=[{"role": "user", "content": report_text}],
)

ttps = next(b.input for b in resp.content if b.type == "tool_use")["techniques"]

Two things make this trustworthy. The required evidence field forces the model to ground every mapping in a quote, so a reviewer can confirm it. And you validate every returned technique_id against the real catalog with mitreattack-python before it goes anywhere. A hallucinated T9999 gets dropped at the gate, not investigated by an analyst.

Build a Queryable CTI Knowledge Base

Once reports are structured, the next win is retrieval. Embed your prior reports, incident write-ups, and intel notes into a vector store (pgvector on Postgres is enough for most teams) and retrieve the few relevant snippets when a new indicator or actor name comes in. The model answers "have we seen this infrastructure before, and in what context?" against your own history instead of its training data.

For the system of record, the structured output should be STIX 2.1 objects built with the OASIS stix2 library, pushed into a platform like MISP (via PyMISP) or OpenCTI through its API. The platform handles deduplication, relationships, and sharing over TAXII. The LLM sits in front as a parser and enricher; it is not the store.

Where LLMs Fail in Threat Intelligence

Plan for these from the first prototype:

Hallucinated indicators reach enforcement. CTI output drives firewall and EDR blocks. A wrong IP becomes a self-inflicted outage. Validate format and confirm with a lookup before any indicator is promoted.
Attribution is not a language task. A model will confidently name an actor on thin evidence. Attribution is an analytic judgment with confidence levels, not a sentence the model completes.
Ingested reports are attacker-influenced. A crafted report can carry an indirect prompt injection (OWASP LLM01, MITRE ATLAS AML.T0054). Keep the extraction model read-only and gate everything it produces.
It cannot count or aggregate at scale. Counting indicators across a large corpus belongs in SQL, not a prompt.

Where to Learn This

The skills here are not "prompt engineering." They are CTI fundamentals (STIX, ATT&CK, indicator hygiene) plus the engineering judgment to know which step is deterministic and which is a language task. Teams that get value from LLMs in threat intelligence already understood their data flows; the model amplifies that, it does not supply it.

GTK Cyber's applied AI and data science training is built for security practitioners who want to wire LLMs into real workflows like this one, with the discipline to keep the model where it helps and out of where it does damage. The generative AI in security operations post covers the same split for the SOC side of the house.