TiltedLunar123

Posted on May 5

My Sigma rule was silently failing and the test suite didn't catch it

#cybersecurity #detection #sigma #sysmon

I'm 18, taking cybersecurity at a community college in Michigan, and most of my detection engineering knowledge comes from reading other people's Sigma rules at like 1am. So when I started building SIEMForge, an open source toolkit that bundles Sigma rules with a Sysmon config and Wazuh custom rules mapped to MITRE ATT&CK, I figured the rules part would be the easy bit. Wrong.

Here's the bug that took me two evenings to actually understand.

The setup

SIEMForge ships with 10 detections in rules/sigma/. They cover the usual suspects: PowerShell download cradles, LSASS dumps, mshta and rundll32 abuse, Run key persistence, scheduled tasks, that kind of thing. There's also a CLI scanner I wrote that loads every rule and runs it against a log file (JSON, JSONL, syslog, or CSV). The point is to test rules locally before you ship them into Splunk or Elastic, so you don't write a rule, deploy it, and then realize three weeks later that it's never fired.

I had one extra rule sitting in a branch called ssh_bruteforce_burst.yml. The intent was simple: if you see N failed ssh logins from the same source IP inside a short window, fire an alert.

I dropped it into the rules folder, ran my sample syslog through the scanner, and got zero alerts. Which would've been fine, except I had also dropped 50 fake "Failed password for root from 198.51.100.4" lines into the syslog sample. Should've been a wall of red.

What I checked first

The dumb stuff first, like always.

Was the rule actually loaded? Yes. The scanner logged [*] Loaded 11 Sigma rules instead of 10.
Was the syslog parser reading the right fields? I added a --verbose flag and dumped the parsed events. Confirmed program=sshd, message="Failed password for root from 198.51.100.4 port 51234 ssh2". Looked fine.
Was the rule selectable in isolation? I ran the converter against just this rule and it spat out valid Splunk SPL. No errors.

So the rule loaded, the events parsed, the conversion worked. And nothing matched.

The actual problem

I went and re-read the Sigma spec for like the third time and noticed something. My detection block looked like this:

detection:
  selection:
    program: sshd
    message: 'Failed password'
  condition: selection | count() by source_ip > 10

The issue is that condition doesn't take an aggregation expression in the form I had written. The correct form puts the aggregation as part of the rule using a separate timeframe field and a condition that references the count. Mine was a frankenstein of old Sigma syntax I'd half-remembered from a blog post.

Worse, my scanner code did the right thing technically. It looked at condition: selection, found the events that matched the selection, and then tried to evaluate | count() by source_ip > 10 as a literal pipeline. My pipeline parser saw something it didn't recognize and bailed silently with a False result, which is exactly the wrong default.

That's the real bug, by the way. The rule file was wrong, but the scanner not telling me it was wrong is what cost the two hours.

The fix

Two changes. First, I rewrote the rule with proper field-condition mapping:

detection:
  selection:
    program: 'sshd'
    message|contains: 'Failed password for'
  timeframe: 5m
  condition: selection | count(source_ip) by source_ip > 10

Second, and more importantly, the scanner now raises on unknown pipeline operators instead of returning False. Here's the relevant chunk in siemforge/scanner.py:

def evaluate_condition(self, condition: str, matches: list) -> bool:
    if "|" in condition:
        base, *pipeline = [p.strip() for p in condition.split("|")]
        result = self._evaluate_base(base, matches)
        for op in pipeline:
            handler = self._pipeline_handlers.get(self._op_name(op))
            if handler is None:
                raise UnknownPipelineOperator(
                    f"unknown pipeline operator '{op}' in condition '{condition}'"
                )
            result = handler(result, op)
        return result
    return self._evaluate_base(condition, matches)

The UnknownPipelineOperator is a custom exception that gets caught one level up and surfaced as a load-time error with the rule filename. So now if a rule is malformed in a way the scanner can't handle, you find out when you load it, not when you wonder why nothing's firing.

Test count went from 137 to 138 because I added one specifically for this case: feed a rule with a garbage pipeline op, assert that loading raises.

What's still broken

Some honest disclosure since this is a portfolio project, not a real product.

The aggregation pipeline in the scanner only supports count() by <field> > N right now. Any other aggregation isn't implemented. It just raises the same exception. That's better than silently passing, but it's not actually doing detection on those rules yet.
The syslog parser is RFC 3164 plus a fallback for RFC 5424. It does not handle vendor-specific formats like Cisco ASA out of the box. You'd need to preprocess.
I don't have a real corpus of malicious vs benign logs to measure false positive rates. The CI just smoke-tests that rules load and the converters produce non-empty output. That's not the same as knowing if a rule is good.

What I'd tell past me

Two things.

One, if you're writing a rule and the test data should match and it doesn't, the bug is in either the rule, the parser, or the matcher. Add print statements to all three before you start theorizing. I wasted an hour assuming it was the syslog parser when it wasn't.

Two, if your detection engine returns False on something it doesn't understand, you've built a system that will lie to you. Always raise. Always.

If you want to look at the code or fork it for your own home lab, the repo is at github.com/TiltedLunar123/SIEMForge. v3.1 has the fix and the expanded test suite. PRs welcome, especially if you've got rules you've tested in the wild.

DEV Community