How I taught a log scanner to tell brute force from credential spray

#cybersecurity #python #blueteam #sigma

I've been building a CLI tool called ThreatLens. It parses EVTX, JSON, Syslog, and CEF logs offline, runs detection rules, and spits out alerts mapped to MITRE ATT&CK. One of the first detectors I wrote was for failed logons (Event ID 4625). Easy, right? Count the failures, threshold it, alert if it crosses N in a window.

That worked for about ten minutes.

The problem showed up when I fed it a real-looking dataset with both a brute force attack and a password spray happening at the same time. My detector lit up on the brute force (one IP hammering one account) but completely missed the spray (one account name being tried against 30 hosts from different sources, slowly). They're both T1110 in MITRE land, but operationally they look nothing alike, and lumping them together meant tuning the threshold either let real attacks through or buried analysts in noise.

So I split them.

What I tried first

The naive version groups by source_ip and counts failed events. Something like:

def detect_brute_force(events, threshold=5, window_seconds=60):
    buckets = defaultdict(list)
    for e in events:
        if e.event_id == 4625:
            buckets[e.source_ip].append(e.timestamp)
    alerts = []
    for ip, times in buckets.items():
        times.sort()
        for i in range(len(times) - threshold + 1):
            if (times[i + threshold - 1] - times[i]).seconds <= window_seconds:
                alerts.append(make_alert(ip, times[i:i + threshold]))
                break
    return alerts

Fine for the textbook brute force. Useless for spray.

The thing is, spray doesn't burst from one IP. It rotates targets. A single account hits five hosts, then ten, paced out so no individual host's failed-logon bucket trips. If you're grouping by source, you'll never see it. If you crank the threshold down to compensate, every misconfigured printer in the building becomes a "brute force" alert.

What worked

Two detectors, two grouping keys, two threshold profiles. Brute force groups by (source_ip, target_username) and looks for tight bursts. Spray groups by target_username only and looks for breadth across distinct hosts in a wider window.

# brute-force: tight, narrow, single source
GROUP_BY = ("source_ip", "target_username")
THRESHOLD = 5
WINDOW = 60  # seconds

# spray: wide, slow, single target
GROUP_BY = ("target_username",)
DISTINCT_FIELD = "computer"  # how many hosts did this account hit?
THRESHOLD = 5
WINDOW = 300

The spray detector is basically asking a different question: "did anyone try this account against an unusual number of hosts in a short window?" Counting raw events doesn't help here. What helps is counting distinct hosts touched.

Once I had both, I added a small ranking step. If both fire on the same dataset, the brute force usually fires first (smaller window) and the spray fires on the same account but against different hosts. The output now flags both with separate severity, separate evidence, separate recommendations. An analyst can tell at a glance which one to chase.

The code path that actually ships

It ended up living in threatlens/detections/brute_force.py. Two classes, both subclassing the same DetectionRule base, both producing Alert objects with the same shape. The base class handles time-window grouping so each detector only writes its own correlation logic. You can tune both via rules/default_rules.yaml without editing Python. Service accounts get suppressed with an allowlist that takes a reason field, which sounds dumb but it's been useful for remembering why the line is there six months later:

allowlist:
  - rule_name: "Brute-Force"
    username: "svc_monitor"
    reason: "service account, expected failed auths from health check"

What's still broken

A few things that bug me.

The detection is window-based, not session-aware. If a spray runs slowly enough to fall outside the 5-minute window but completes over 30 minutes, I miss it. I've thought about a sliding global account-watch list with decay, but I haven't written it yet.

The MITRE mapping is correct but coarse. Both detectors emit T1110 with no sub-technique. I want to add T1110.001 (password guessing) and T1110.003 (password spraying) explicitly so downstream tools can tell them apart without parsing my description string.

There's no enrichment for source IP. I don't know if the spray is coming from a known-bad ASN, a Tor exit, or just the marketing intern who forgot a password. The architecture supports a plugin step for this, I just haven't written that plugin.

What I'd do differently

If I started over, I'd build the time-window grouping primitive first and then write detectors on top, instead of writing each detector with its own buckets. I refactored to that shape after the fourth or fifth detector and it would have saved me a lot of duplicated correlation code.

The repo is at github.com/TiltedLunar123/ThreatLens. It's MIT, runs on Python 3.10+, only runtime dep is PyYAML, and there's a Docker image if you don't want to deal with venvs. Sample logs are included so you can run it and see what the output looks like in about 30 seconds.

It works. Not perfect but it works.