TiltedLunar123

Posted on Jun 12

I built an offline threat-hunting CLI in python because spinning up a SIEM for one log file is overkill

#security #cybersecurity #opensource #python

so here's the situation i kept running into while studying for security+ and messing with sample log sets. i'd have a single evtx export or a json dump from some lab, and i wanted to know "is there anything bad in here" without standing up elastic or splunk or wazuh just to look at one file.

every time the answer was the same. spin up infra, ingest, write a query, wait. for one file. it's a lot.

so i wrote threatlens. it's a python cli that reads a log file (or a folder) and tells you what looks suspicious, mapped to mitre att&ck. no server, no agent, no internet. you point it at a file and it scans.

threatlens scan sample_data/sample_security_log.json

that's it. on a 26-event sample it pulls out 1 critical (sam registry access), 8 high (lateral movement, a certutil download, scheduled task creation, an actual multi-stage chain), a couple medium brute-force hits. takes a fraction of a second.

what i tried first

my first version was just a big pile of regex and if-statements. it worked for like three detections and then became unmaintainable instantly. every new rule meant editing the core scanner. bad idea.

so i split it. there's a detection engine, and rules live separately. you get the built-in modules (13 of them right now: brute force, priv esc, defense evasion, persistence, dns tunneling, kerberos stuff, etc) but you can also drop in your own yaml rules or sigma rules or even a python plugin if you need real logic.

a custom rule looks like this:

- id: suspicious-certutil-download
  title: certutil used to download a file
  severity: high
  technique: T1059.003
  match:
    - field: process.command_line
      contains: "certutil"
    - field: process.command_line
      contains: "-urlcache"

twelve operators total (contains, regex, equals, in, gt, that kind of thing). nothing fancy but it covers most of what i actually needed.

the part i'm weirdly proud of

it correlates. a single "new service created" event isn't that interesting on its own. but new service, then a privilege escalation, then lateral movement, in order, from the same host? that's a chain. threatlens groups those and flags the chain as its own high-severity finding instead of three disconnected blips. that was the hardest part to get right and it's still kind of naive but it works on the samples i throw at it.

the performance thing i didn't expect

it's pure python. i fully expected it to be slow. it's not great, but it's faster than i thought. single core, python 3.11, on my windows laptop:

9k events: 0.13s
90k events: 1.27s
900k events: ~14s

so roughly 70k events/sec. for python with no c extensions i'll take it. i'm not going to pretend it competes with the rust tools though. hayabusa and chainsaw do millions of events a minute and i'm not close. if you've got terabytes of logs, use those. threatlens is for when you've got one weird file and you want an answer now.

input formats it reads: json/ndjson, evtx (native windows event log via python-evtx), syslog (rfc 3164 and 5424), and cef. output goes to the terminal by default but you can dump json/csv, an html report with a severity donut, an interactive timeline, an att&ck navigator layer, or a stix bundle.

ci/cd was an afterthought that turned out useful

you can run it with --fail-on high and it exits with code 2 if anything high-or-above shows up. so you can drop it in a pipeline as a gate.

threatlens scan logs/ --fail-on high

i didn't plan that one. someone suggested it and it was a 10 minute change.

what's broken / what i'd do differently

the correlation logic is rule-of-thumb, not a real graph. if events are out of order or timestamps are weird it can miss a chain.
evtx parsing leans on python-evtx and it's slow on big files compared to everything else. that 900k benchmark was json. evtx is noticeably worse.
i tested the false-positive rate on a "mixed enterprise" sample with benign noise mixed in and got zero fp, which sounds great but it's 52 events. that's not a real corpus. i don't actually know how it does on real prod logs yet because i don't have real prod logs (i'm a student lol).
no streaming for huge files yet. it loads things into memory. fine for a few hundred MB, not fine for 10 GB.

if i rebuilt it i'd probably do the hot parsing loop in rust and keep the rule engine in python. best of both. maybe later.

it's MIT, it's on pypi as threatlens-cli, source is here: https://github.com/TiltedLunar123/ThreatLens

it works. not perfect but it works. if you try it on real logs i'd genuinely love to know what it misses.

Top comments (2)

Alex Shev • Jun 13

Offline threat hunting tools are underrated. A SIEM is great when the organization already has the pipeline, but a single suspicious log file often needs fast local triage. The best version of this kind of CLI is opinionated enough to surface patterns without pretending it replaces a full detection stack.

Leopard • Jun 13

Great project!😊 I really like the idea of analyzing logs without setting up heavy infrastructure. The correlation feature sounds especially useful.