DEV Community

MaxHagl
MaxHagl

Posted on

๐Ÿšจ Building an IOC Triage Pipeline with Suricata + ML + Docker

Honeypots generate tons of noisy logs. The challenge: how do you quickly tell which IPs deserve your attention and which are just background noise?
In this post, Iโ€™ll walk through how I built an IOC triage pipeline that ingests Suricata/Zeek telemetry, scores suspicious IPs, applies unsupervised ML, and outputs actionable blocklists.

๐ŸŒ The Problem

If youโ€™ve ever run a honeypot like T-Pot, you know the drill:

  • Gigabytes of Suricata/Zeek alerts
  • Thousands of unique source IPs
  • Endless false positives

Manually sorting through all this isnโ€™t scalable.
I wanted a pipeline that could automatically:

  1. Aggregate activity per IP
  2. Score each IP on suspicious behavior
  3. Use ML to flag anomalies
  4. Output human-readable casefiles + blocklists

๐Ÿ› ๏ธ The IOC Triage Pipeline

I built a Python tool (ioc_triage.py) that takes NDJSON logs and produces structured outputs.

Key Features

  • Ingest Suricata/Zeek/T-Pot logs
  • Aggregate features like flows/min, unique ports, entropy, burstiness
  • Rule-based scoring (customizable via config.yaml)
  • Unsupervised ML (IsolationForest + LOF + OCSVM, optional PyOD HBOS+COPOD)
  • Fusion of rules + ML โ†’ combined tier (observe, investigate, block_candidate)

  • Outputs:

    • Enriched per-IP CSVs
    • JSON casefiles
    • Blocklists (per-IP and prefix)

โš™๏ธ How It Works

1. Ingest

Reads Suricata NDJSON logs:
bash

python scripts/ioc_triage.py \
  --input data/samples/raw.ndjson \
  --hours 72 -vv
Enter fullscreen mode Exit fullscreen mode

2. Aggregate

Per source IP, it computes:

  • Flows/minute
  • Unique src/dst ports
  • Burstiness (variance of activity)
  • Port entropy
  • Signature counts ###3. Score

Configurable rule weights in scripts/config.yaml:
yaml

score:
  weights:
    flows_per_min: 2.0
    unique_dst_ports: 1.6
    unique_src_ports: 1.3
    alert_count: 0.8
    max_severity: 0.6
  thresholds:
    block: 7.0
    investigate: 3.5
Enter fullscreen mode Exit fullscreen mode

4. Machine Learning

Uses unsupervised anomaly detection:

  • IsolationForest
  • LocalOutlierFactor
  • OneClassSVM (Optionally PyOD HBOS+COPOD)

Scores are normalized and combined into ml_score + ml_confidence.

5. Fusion

Rules + ML = tier_combined
โ†’ final decision: observe, investigate, or block_candidate.

๐Ÿ“ฆ Setup

Clone the repo:
bash

git clone https://github.com/YOUR-USERNAME/ioc-triage-pipeline.git
cd ioc-triage-pipeline
Enter fullscreen mode Exit fullscreen mode

Install requirements:
bash

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

(Optional ML extras):
bash

pip install pyod scikit-learn
Enter fullscreen mode Exit fullscreen mode

Or run via Docker:
bash

docker build -t ioc-triage .
docker run -it --rm -v $(pwd):/app ioc-triage \
    python scripts/ioc_triage.py --input data/samples/raw.ndjson --hours 72 -vv
Enter fullscreen mode Exit fullscreen mode

๐Ÿ” Example Output

table

ip  score   ml_score    tier    ml_tier tier_combined   reason
61.184.87.135   9.455   0.944   block_candidate block   block_candidate flows/min high, burstiness high, multiple ports
Enter fullscreen mode Exit fullscreen mode

Outputs:

  • data/outputs/enriched.csv โ†’ per-IP features
  • cases/.json โ†’ casefiles
  • outputs/blocklist_combined.tsv โ†’ fused blocklist
  • outputs/blocklist_combined_prefix.tsv โ†’ aggregated /24 + /48 prefixes

๐Ÿ™Œ Why This Matters

This project turns raw honeypot noise into actionable intelligence:

  • Analysts can focus on high-confidence threats
  • Blocklists update automatically
  • You can tune thresholds & ML contamination rates

Itโ€™s also great for students (like me!) to showcase ML + cybersecurity skills in a practical, portfolio-ready way.

๐Ÿ“š Whatโ€™s Next?

  • Try deep learning models (autoencoders, transformers)
  • Add active enrichment (WHOIS, VirusTotal, AbuseIPDB)
  • Build dashboards for live triage

๐Ÿ‘‰ GitHub Repository

If youโ€™re into honeypots, ML, or threat intelligence, give it a โญ on GitHub and let me know what features youโ€™d like to see next!

Top comments (0)