<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: T0nd3</title>
    <description>The latest articles on DEV Community by T0nd3 (@t0nd3).</description>
    <link>https://dev.to/t0nd3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F259880%2Fb78e9059-1102-49af-8273-2af46d4ee56e.png</url>
      <title>DEV Community: T0nd3</title>
      <link>https://dev.to/t0nd3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/t0nd3"/>
    <language>en</language>
    <item>
      <title>From a 10,000-line OpenSearch export script to a log analysis tool</title>
      <dc:creator>T0nd3</dc:creator>
      <pubDate>Fri, 22 May 2026 23:29:15 +0000</pubDate>
      <link>https://dev.to/t0nd3/from-a-10000-line-opensearch-export-script-to-a-log-analysis-tool-2b65</link>
      <guid>https://dev.to/t0nd3/from-a-10000-line-opensearch-export-script-to-a-log-analysis-tool-2b65</guid>
      <description>&lt;h2&gt;
  
  
  How this started
&lt;/h2&gt;

&lt;p&gt;A while back at work — I'm a senior developer by day — I needed to pull log&lt;br&gt;
reports out of OpenSearch. The catch: each export was capped at &lt;strong&gt;10,000&lt;br&gt;
lines&lt;/strong&gt;. So I'd pull a batch, anonymize it (the logs had PII that wasn't&lt;br&gt;
allowed in the report), then run it through a quick script that grouped&lt;br&gt;
errors by signature and counted how often each one fired — which classes&lt;br&gt;
were spiking, which were noise, which were genuinely new.&lt;/p&gt;

&lt;p&gt;Doing that by hand-with-scripts a few times made the shape of a tool&lt;br&gt;
obvious: pull logs from where they live → strip PII reliably → group by&lt;br&gt;
error fingerprint → flag the things that matter. Once I started building it&lt;br&gt;
as a real tool, the scope grew the way these things do. What about syslog&lt;br&gt;
and journald, not just OpenSearch? What about rules for &lt;em&gt;known&lt;/em&gt; bad patterns&lt;br&gt;
(auth failures, SSH brute force, 5xx spikes), not only anomalies? What about&lt;br&gt;
a dashboard so I'm not staring at terminal output?&lt;/p&gt;

&lt;p&gt;Now it's Logatory.&lt;/p&gt;
&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;A Python CLI plus an optional FastAPI/HTMX dashboard. You point it at your&lt;br&gt;
logs and it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;auto-detects the format — syslog, Nginx, JSON lines, journald, Windows
EVTX, plaintext;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;redacts PII&lt;/strong&gt; — emails, IPs, tokens, card numbers — &lt;em&gt;before&lt;/em&gt; anything is
written to disk;&lt;/li&gt;
&lt;li&gt;runs detection rules (own YAML format + Sigma) plus statistical anomaly
detection (Z-score baseline);&lt;/li&gt;
&lt;li&gt;stores findings in a local SQLite DB;&lt;/li&gt;
&lt;li&gt;optionally explains findings in plain language via an LLM (local Ollama by
default, so that stays local too).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quick try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;logatory
logatory scan /var/log/auth.log &lt;span class="nt"&gt;--track-errors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'logatory[web]'&lt;/span&gt;
logatory serve
&lt;span class="c"&gt;# open http://127.0.0.1:8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvn1gay30jvdup67wb32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvn1gay30jvdup67wb32.png" alt="Logatory web dashboard" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where logs come from
&lt;/h2&gt;

&lt;p&gt;OpenSearch was the original source — but most log tools assume your logs&lt;br&gt;
already arrive in &lt;em&gt;their&lt;/em&gt; store. Logatory inverts that: it reads logs from&lt;br&gt;
wherever they already live.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;files, globs, gzipped archives&lt;/li&gt;
&lt;li&gt;the systemd journal (&lt;code&gt;journalctl -o json&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Docker container logs (straight from the daemon)&lt;/li&gt;
&lt;li&gt;remote hosts over SSH (no agent on the remote box)&lt;/li&gt;
&lt;li&gt;an existing OpenSearch / Loki / Graylog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already run one of those stacks, Logatory layers detection and PII&lt;br&gt;
redaction on top — it doesn't try to replace them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Fleet mode
&lt;/h2&gt;

&lt;p&gt;The piece I'm happiest with: declare your sources in one &lt;code&gt;targets.yaml&lt;/code&gt;…&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web01&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ssh&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web01.example&lt;/span&gt;
    &lt;span class="na"&gt;journald&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;unit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx.service&lt;/span&gt;
    &lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;web&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;prod&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prod-loki&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;loki&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://loki:3100&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{namespace="prod"}'&lt;/span&gt;
    &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${LOKI_TOKEN}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and work the whole fleet at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;logatory fleet scan          &lt;span class="c"&gt;# every target once, concurrently&lt;/span&gt;
logatory fleet &lt;span class="nb"&gt;tail&lt;/span&gt;          &lt;span class="c"&gt;# follow them all live, findings-only by default&lt;/span&gt;
logatory fleet list &lt;span class="nt"&gt;--check&lt;/span&gt;  &lt;span class="c"&gt;# who's reachable?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A dead target is reported without aborting the run. &lt;code&gt;fleet tail&lt;/code&gt; polls each&lt;br&gt;
target in its own thread and merges everything into one stream, with a&lt;br&gt;
periodic heartbeat line so silence isn't ambiguous. There's an interactive&lt;br&gt;
&lt;code&gt;logatory fleet init&lt;/code&gt; wizard for the config, and the web dashboard ships a&lt;br&gt;
config editor for it too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest about scope
&lt;/h2&gt;

&lt;p&gt;It's a &lt;strong&gt;v0.4.0 (beta), single-maintainer, SQLite-backed tool&lt;/strong&gt;. Built for&lt;br&gt;
&lt;em&gt;one person&lt;/em&gt; analysing &lt;em&gt;their own systems&lt;/em&gt; — not a multi-tenant SIEM. If you&lt;br&gt;
need petabyte-scale real-time analytics, this is the wrong tool. If you want&lt;br&gt;
detection rules on your auth.log, your Nginx, your homelab's journald, with&lt;br&gt;
PII redaction baked in and no infrastructure to babysit, that's the niche.&lt;/p&gt;

&lt;h2&gt;
  
  
  On how it was built
&lt;/h2&gt;

&lt;p&gt;Full disclosure: the build leaned heavily on an AI coding assistant. The&lt;br&gt;
product, the architecture and the iteration are mine; the assistant handled&lt;br&gt;
the implementation under my direction. Flagging it openly — happy to talk&lt;br&gt;
about the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/T0nd3/logatory" rel="noopener noreferrer"&gt;https://github.com/T0nd3/logatory&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/logatory/" rel="noopener noreferrer"&gt;https://pypi.org/project/logatory/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Apache-2.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback very welcome — especially on the detection rules and the source&lt;br&gt;
adapters.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>selfhosted</category>
      <category>security</category>
    </item>
  </channel>
</rss>
