I Built an Offline Log Triage CLI That Detects MITRE ATT&CK Techniques
SOC analysts routinely face thousands of daily events, and most of it is noise. If you're on an incident response engagement with no network access, or you're staring at a zip of EVTX files a client dropped in your lap, spinning up a full SIEM isn't realistic.
That's the gap ThreatLens is trying to fill. It's a single Python CLI that parses log formats SOC analysts actually encounter — JSON/NDJSON, EVTX, Syslog (RFC 3164 and 5424), and CEF — and runs a detection engine mapped to MITRE ATT&CK, entirely offline, with only PyYAML as a hard runtime dependency.
This post walks through what it does, why I built it, and a code walkthrough of how the detection engine works.
What It Actually Does
Point it at a log file or a directory, and it runs 12 built-in detection modules covering tactics like brute force, lateral movement, privilege escalation, and persistence. Output is whatever you need for the workflow:
- Color-coded terminal output for quick triage
- Self-contained HTML reports with charts and interactive timelines
- JSON for piping into a SIEM
- CSV for handoffs to analysts who live in spreadsheets
- Elasticsearch bulk API format when you want to ingest the findings
A few representative commands:
# Basic scan against a JSON log
threatlens scan logs/security.json
# Native EVTX parsing — no conversion step
threatlens scan evidence/security.evtx
# HTML report, filtered to high severity and above
threatlens scan logs/ -o report.html -f html --min-severity high
# Real-time tailing for live triage
threatlens follow /var/log/events.json
# Use community Sigma rules
threatlens scan logs/ --sigma-rules sigma/rules/windows/
# CI/CD gate — exit non-zero on high-severity findings
threatlens scan logs/ --fail-on high
Why I Built It
I'm a cybersecurity student, and the gap between "read a blog post about MITRE ATT&CK" and "actually recognize T1110 in a pile of Event ID 4625s" is large. The only way I found to close it was to write the detection logic myself.
Existing tools kept tripping on the same pain points:
- Chainsaw, EvtxHussar, hayabusa are excellent but Windows-centric. I wanted one tool that chews through EVTX and the Linux syslog a cloud workload dumped.
- Full SIEM stacks are overkill for a one-off engagement. I didn't want to stand up Elastic + Kibana + Filebeat to look at 200MB of logs.
- Custom scripts get rewritten every engagement. I wanted rule-based logic I could extend without touching parsing code.
So the design goal was: one binary, offline, format-agnostic, rule-driven, zero infra.
Code Walkthrough: The Rule Engine
The most interesting piece architecturally is the rule engine, because it has to be expressive enough to write real detections but simple enough that an analyst can author rules in YAML without writing Python.
A rule looks like this:
rules:
- name: "After-Hours Logon"
severity: medium
conditions:
- field: event_id
operator: equals
value: "4624"
threshold: 1
That's a toy example. The real power shows up when you combine operators. The engine ships with 12: equals, not_equals, contains, not_contains, startswith, endswith, regex, gt, lt, gte, lte, and in. Conditions AND together by default, and thresholds turn a single-event match into a frequency-based detection (5 failures in 60 seconds, etc.).
The evaluation loop is roughly:
def evaluate_rule(rule, event):
for condition in rule.conditions:
field_value = event.get(condition.field)
if not OPERATORS[condition.operator](field_value, condition.value):
return False
return True
Where OPERATORS is a dict mapping operator names to small comparison lambdas. Adding a new operator is a one-line change — no parser work, no AST manipulation. This is deliberate. I wanted rule authoring to feel like writing a conditional, not configuring a parser.
Thresholded rules layer on top. The engine keeps a sliding window per (rule, grouping_key) — typically grouped by source IP or username — and fires the alert only when count >= threshold within the window. Grouping keys are pulled from the event by name, which means a rule author can correlate on any field the parser exposes without modifying code.
The parsing phase normalizes every format into a flat dict with a small set of canonical fields (timestamp, source_ip, username, event_id, process, command_line, etc.) plus the raw original. That normalization is what makes the rule engine format-agnostic — the same "Brute Force" rule works against Windows 4625 events, SSH auth.log failures, and a CEF feed from a firewall, because they all expose username and source_ip after parsing.
Attack Chain Correlation
The feature I'm most proud of is multi-stage correlation. A single brute-force alert is interesting. A brute-force alert followed 30 seconds later by a successful logon from the same IP, followed by a net user /add execution, is an incident. The correlation engine takes a chain definition — an ordered list of rule names with time windows — and fires a higher-severity meta-alert when it sees the full kill chain in order.
That's how low-severity events ladder up to high-severity incidents without drowning analysts in noise.
Allowlist Suppression
Every detection tool dies on false positives. The allowlist layer suppresses known-good activity by rule name plus any combination of event fields:
allowlist:
- rule_name: "Brute-Force"
username: "svc_monitor"
reason: "Expected service account failures"
The reason field is required. It forces the author to document why this exception exists, which is the single most important thing about any suppression rule in production.
Testing
Against the sample datasets in the repo, ThreatLens hit zero false positives on benign activity while detecting 100% of embedded attack techniques with correct MITRE ATT&CK classification. That's on curated data — real-world logs are messier — but it's a useful floor.
Try It
pip install threatlens
threatlens scan /path/to/your/logs
Or clone it and poke at the rule engine: github.com/TiltedLunar123/ThreatLens
If you find it useful, a star helps the repo surface for other analysts. PRs for new detections or log formats are especially welcome — I'd love to see what rules other people write.
Top comments (0)