From a 10,000-line OpenSearch export script to a log analysis tool

#opensource #python #selfhosted #security

How this started

A while back at work — I'm a senior developer by day — I needed to pull log
reports out of OpenSearch. The catch: each export was capped at 10,000
lines. So I'd pull a batch, anonymize it (the logs had PII that wasn't
allowed in the report), then run it through a quick script that grouped
errors by signature and counted how often each one fired — which classes
were spiking, which were noise, which were genuinely new.

Doing that by hand-with-scripts a few times made the shape of a tool
obvious: pull logs from where they live → strip PII reliably → group by
error fingerprint → flag the things that matter. Once I started building it
as a real tool, the scope grew the way these things do. What about syslog
and journald, not just OpenSearch? What about rules for known bad patterns
(auth failures, SSH brute force, 5xx spikes), not only anomalies? What about
a dashboard so I'm not staring at terminal output?

Now it's Logatory.

What it does

A Python CLI plus an optional FastAPI/HTMX dashboard. You point it at your
logs and it:

auto-detects the format — syslog, Nginx, JSON lines, journald, Windows EVTX, plaintext;
redacts PII — emails, IPs, tokens, card numbers — before anything is written to disk;
runs detection rules (own YAML format + Sigma) plus statistical anomaly detection (Z-score baseline);
stores findings in a local SQLite DB;
optionally explains findings in plain language via an LLM (local Ollama by default, so that stays local too).

Quick try:

pip install logatory
logatory scan /var/log/auth.log --track-errors

For the dashboard:

pip install 'logatory[web]'
logatory serve
# open http://127.0.0.1:8080

Where logs come from

OpenSearch was the original source — but most log tools assume your logs
already arrive in their store. Logatory inverts that: it reads logs from
wherever they already live.

files, globs, gzipped archives
the systemd journal (journalctl -o json)
Docker container logs (straight from the daemon)
remote hosts over SSH (no agent on the remote box)
an existing OpenSearch / Loki / Graylog

If you already run one of those stacks, Logatory layers detection and PII
redaction on top — it doesn't try to replace them.

Fleet mode

The piece I'm happiest with: declare your sources in one targets.yaml…

targets:
  - name: web01
    type: ssh
    host: web01.example
    journald: true
    unit: nginx.service
    groups: [web, prod]
  - name: prod-loki
    type: loki
    url: http://loki:3100
    query: '{namespace="prod"}'
    token: ${LOKI_TOKEN}

…and work the whole fleet at once:

logatory fleet scan          # every target once, concurrently
logatory fleet tail          # follow them all live, findings-only by default
logatory fleet list --check  # who's reachable?

A dead target is reported without aborting the run. fleet tail polls each
target in its own thread and merges everything into one stream, with a
periodic heartbeat line so silence isn't ambiguous. There's an interactive
logatory fleet init wizard for the config, and the web dashboard ships a
config editor for it too.

Honest about scope

It's a v0.4.0 (beta), single-maintainer, SQLite-backed tool. Built for
one person analysing their own systems — not a multi-tenant SIEM. If you
need petabyte-scale real-time analytics, this is the wrong tool. If you want
detection rules on your auth.log, your Nginx, your homelab's journald, with
PII redaction baked in and no infrastructure to babysit, that's the niche.

On how it was built

Full disclosure: the build leaned heavily on an AI coding assistant. The
product, the architecture and the iteration are mine; the assistant handled
the implementation under my direction. Flagging it openly — happy to talk
about the workflow.