Testing Sigma Rules Against Local Logs Without a SIEM

#cybersecurity #sigma #python #devops

I'd written a few Sigma rules for my home lab and wanted to know if they actually fired on real Sysmon events. The standard answer is "deploy to Wazuh and replay logs". That's a lot of overhead when I just want to confirm a regex matches.

So I built SIEMForge. It's a Python CLI that loads Sigma YAML files, parses the detection logic, and matches it against JSON, JSONL, syslog, or CSV log files locally. No SIEM required.

This post is the messy version of how it came together. The final code is on GitHub at github.com/TiltedLunar123/SIEMForge.

The problem

I had ten Sigma rules covering things like LSASS dumps, suspicious PowerShell, and registry persistence. To validate them I'd been:

starting Wazuh in a VM
shipping a Sysmon JSONL via filebeat
SSHing to the manager and tailing alerts.log
realizing the rule didn't fire because I had a typo in the field name

Round trip on a single rule edit was about 4 minutes. For ten rules iterating through false positive checks, the math gets bad.

What I actually wanted: siemforge --scan events.json and a list of which rules fired with which event ID.

First attempt: just regex everything

Naive plan. Sigma rules look simple. They have a detection block with selection, filter, and condition keys. The condition usually says selection and not filter. How hard can it be?

Hard. The condition language allows:

selection
selection and filter
selection or filter
selection and not filter
1 of selection*
all of them

I started with a bare boolean parser that handled and, or, not. About 70% of my rules worked. The 1 of selection* ones broke. So did the wildcards in field values like CommandLine|contains: '*-ep bypass*'.

I rewrote the matcher around a small expression tree instead of regex. Each detection block compiles to a callable: given an event dict, return bool. The condition is parsed once at load and evaluated per event.

def compile_selection(selection: dict) -> Callable[[dict], bool]:
    matchers = []
    for key, value in selection.items():
        field, _, modifier = key.partition("|")
        matcher = build_field_matcher(field, modifier or "equals", value)
        matchers.append(matcher)
    return lambda event: all(m(event) for m in matchers)

build_field_matcher handles contains, startswith, endswith, and re modifiers. Wildcards in raw values (*-ep bypass*) get translated to a contains check at compile time.

The field name problem

Sysmon JSON via Wazuh ships fields as process.command_line. Raw Sysmon EVTX via evtxecmd ships them as CommandLine. Splunk ships them as CommandLine too but inside a _raw blob.

My rules were written against the EVTX naming. When I tested against the Wazuh-formatted JSON, nothing matched. Took me an hour to figure out why.

Two options: rewrite all the rules, or normalize the events. I went with normalize. There's a field_aliases.yml that maps common variants:

CommandLine:
  - process.command_line
  - data.win.eventdata.commandLine
  - winlog.event_data.CommandLine

The scanner tries each alias when the canonical field is missing. Not pretty but it stopped me from owning two copies of every rule.

What actually works

After three rewrites the scanner runs on a 4823-event Sysmon dump in under a second. Output looks like:

[*] Scanning /var/log/sysmon/events.jsonl (jsonl, 4823 events)

[ALERT] Rule: Suspicious PowerShell Download Cradle
        Technique: T1059.001
        Event #312 | 2026-03-14T08:41:02Z
        CommandLine: powershell -ep bypass -c "IEX(New-Object Net.WebClient).DownloadString(...)"

[*] Scan complete: 2 alerts across 4823 events

That's the round trip I wanted. Edit a rule, rerun, see if it fires. About 2 seconds end to end now.

The Sigma to Splunk/Elastic/Kibana converter is a side benefit. Same compiled tree, different emit step.

What's broken

Plenty.

The syslog parser is held together with tape. RFC 3164 vs RFC 5424 timestamp detection works on the obvious cases, but if the host writes a non-standard date format (looking at you, pfSense) fields end up mis-split. I have a TODO to switch to pyparsing instead of my hand-rolled tokenizer.

The MITRE coverage matrix is per-rule, not per-technique. It tells you which rule covers which technique, but not which techniques you have zero coverage on. That's the actually useful direction. v3.2 work.

Sigma's 1 of operator is implemented for selection groups but not for conditions like 1 of selection_*. About 10% of public Sigma rules use that pattern, so the rule loader currently warns and skips them.

What I'd do differently

I should have started with the reference Sigma backend (pySigma) instead of writing a parser from scratch. By the time I realized that, I'd already shipped two converters and ripping it out felt worse than maintaining the homegrown one. If I started today I'd wrap pySigma and add my own scanner on top.

Test data. I waited too long to build sample log files for each technique. Now there's a samples/ directory with process injection, service installation, user creation, CSV examples, and a clean baseline for false positive checks. Should have been there from commit one.

CI. I added it at v3.0 after a regression broke the Splunk converter on Windows. 138 tests run on every push now. Worth the upfront cost.

Use it

git clone https://github.com/TiltedLunar123/SIEMForge.git
cd SIEMForge
pip install pyyaml
python -m siemforge --scan samples/events.jsonl

If you write Sigma rules and hate the deploy-test loop, give it a try. Issues and PRs welcome. There's a CONTRIBUTING file with rule submission guidelines.

Repo: https://github.com/TiltedLunar123/SIEMForge