<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: platinum2high</title>
    <description>The latest articles on DEV Community by platinum2high (@platinum2high).</description>
    <link>https://dev.to/platinum2high</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3983103%2F95719c47-ae27-410f-a6d7-897dc93663d6.png</url>
      <title>DEV Community: platinum2high</title>
      <link>https://dev.to/platinum2high</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/platinum2high"/>
    <language>en</language>
    <item>
      <title>Building a Multi-Source Threat Intelligence Correlation Engine in Python</title>
      <dc:creator>platinum2high</dc:creator>
      <pubDate>Sat, 13 Jun 2026 19:21:41 +0000</pubDate>
      <link>https://dev.to/platinum2high/building-a-multi-source-threat-intelligence-correlation-engine-in-python-4e2g</link>
      <guid>https://dev.to/platinum2high/building-a-multi-source-threat-intelligence-correlation-engine-in-python-4e2g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A SOC analyst's notes on going from "I want to learn async" to a working tool that other analysts can clone and use.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I'm a SOC analyst learning Python and built &lt;strong&gt;&lt;a href="https://github.com/platinum2high/ioc-hunter" rel="noopener noreferrer"&gt;IOC Hunter&lt;/a&gt;&lt;/strong&gt; — an async tool that takes a chunk of text (phishing report, log dump, Slack export), extracts every indicator inside, queries six threat-intel sources in parallel, and produces a verdict you can drop into a ticket or a SIEM.&lt;/p&gt;

&lt;p&gt;This article is the &lt;em&gt;why&lt;/em&gt; and the &lt;em&gt;how&lt;/em&gt; — the architectural decisions I had to think through, the things that bit me, and a small dose of "what I learned about myself as an engineer."&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/platinum2high" rel="noopener noreferrer"&gt;
        platinum2high
      &lt;/a&gt; / &lt;a href="https://github.com/platinum2high/ioc-hunter" rel="noopener noreferrer"&gt;
        ioc-hunter
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Async threat intelligence correlation engine. Auto-parses IOCs from raw text, enriches them across 6 TI feeds in parallel, exports STIX/MISP/Sigma/Suricata. Works keyless out of the box.
    &lt;/h3&gt;
  &lt;/div&gt;
&lt;/div&gt;





&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I sit in a SOC. The shape of my day is: alert fires → triage → mostly boring → occasionally interesting → write a ticket.&lt;/p&gt;

&lt;p&gt;The "occasionally interesting" part is where I noticed the same workflow repeating. Someone forwards me a phishing email. The body has IPs, URLs, hashes, an email address. Half of them are defanged (&lt;code&gt;evil[.]com&lt;/code&gt;, &lt;code&gt;hxxps://&lt;/code&gt;, &lt;code&gt;bad[at]evil[.]com&lt;/code&gt;). Some are encoded — base64 in the headers, hex in the payload.&lt;/p&gt;

&lt;p&gt;To triage, I do roughly this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Refang each indicator by hand&lt;/li&gt;
&lt;li&gt;Open VirusTotal, paste&lt;/li&gt;
&lt;li&gt;Open AbuseIPDB, paste&lt;/li&gt;
&lt;li&gt;Open URLhaus, paste&lt;/li&gt;
&lt;li&gt;Mentally aggregate "VT says X, AbuseIPDB says Y, URLhaus has it as Z"&lt;/li&gt;
&lt;li&gt;Decide&lt;/li&gt;
&lt;li&gt;Write the ticket, paraphrasing the sources&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a &lt;strong&gt;30-minute manual process&lt;/strong&gt; for what should be &lt;strong&gt;30 seconds&lt;/strong&gt;. And most existing IOC checkers I found on GitHub were 1:1: one IOC in, one source out. They didn't solve the workflow problem — they just slightly automated step 2.&lt;/p&gt;

&lt;p&gt;So I wrote one that solves the whole thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it actually does
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ioc-hunter check &lt;span class="s2"&gt;"185[.]220[.]101[.]42"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;╭─────── IOC Hunter ────────╮
│ 185[.]220[.]101[.]42      │
│ type: ipv4                │
│                           │
│ MALICIOUS  confidence 46% │
╰───────────────────────────╯

Source       Verdict      Score   Notes
─────────────────────────────────────────────
tor_exit     SUSPICIOUS    0.50   tor, anonymizer
abuseipdb    MALICIOUS     1.00   country:DE, isp:Tor-Exit traffic
otx          MALICIOUS     1.00   Bruteforce, SSH, Honeypot
virustotal   MALICIOUS     0.15   suspicious-udp, tor
urlhaus      UNKNOWN       0.00
threatfox    UNKNOWN       0.00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six sources, queried in parallel, defanged on input &lt;em&gt;and&lt;/em&gt; on output (so you can paste the result into a chat without anyone clicking it), weighted verdict with the &lt;strong&gt;per-source contribution shown explicitly&lt;/strong&gt; so you can defend the call in a ticket.&lt;/p&gt;

&lt;p&gt;But the real feature is &lt;code&gt;scan-file&lt;/code&gt; — drop in a 200-line incident report, get back every indicator inside, each enriched, sorted by confidence. And &lt;code&gt;correlate&lt;/code&gt; finds the pivots: shared infrastructure, shared malware tags, URL-to-host relationships across the batch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architectural Decisions That Took Thought
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The plugin pattern for sources
&lt;/h3&gt;

&lt;p&gt;I want adding a new TI feed to be &lt;strong&gt;one file, no other changes anywhere&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ABC&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;supported_types&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;frozenset&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;IOCType&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;requires_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

    &lt;span class="nd"&gt;@abstractmethod&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ioc_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IOCType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ioc_value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SourceResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each source is a class with class-level metadata (&lt;code&gt;weight&lt;/code&gt;, &lt;code&gt;supported_types&lt;/code&gt;, &lt;code&gt;requires_key&lt;/code&gt;) and one method. The orchestrator introspects the metadata to pick which sources to query for each IOC and to skip ones whose key isn't configured.&lt;/p&gt;

&lt;p&gt;This means I can drop in a Shodan source tomorrow and not touch the engine, scorer, or CLI.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Graceful degradation &amp;gt; opinionated requirements
&lt;/h3&gt;

&lt;p&gt;A naive design: "no API keys → tool doesn't work." A user-friendly design: &lt;strong&gt;every source short-circuits to UNKNOWN if its key is missing, with an explanatory error message; the rest run normally&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@property&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_configured&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_key&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator skips unconfigured sources before they ever fire a request. So if you clone my repo and run it without registering for anything, you still get a verdict — just from the one truly-keyless source (Tor exit list). Five API keys unlock the rest.&lt;/p&gt;

&lt;p&gt;This is the difference between "demo project" and "tool people actually try." Anyone cloning it sees output in 30 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Transparent weighted scoring, not a black box
&lt;/h3&gt;

&lt;p&gt;Every verdict comes with the per-source contribution. The scoring formula is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;weighted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Verdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromkeys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Verdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;valid_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sources_by_name&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MALICIOUS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUSPICIOUS&lt;/span&gt;&lt;span class="p"&gt;}:&lt;/span&gt;
        &lt;span class="n"&gt;weighted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MIN_PRESENCE_SCORE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="n"&gt;BENIGN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;weighted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then severity-prioritized thresholds (malicious share ≥ 25% wins, etc.).&lt;/p&gt;

&lt;p&gt;The whole function is 30 lines. An analyst can read it and reproduce the verdict on paper. That matters when defending a finding in an incident review.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Async concurrency with a global cap
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_concurrency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_sem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Semaphore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_concurrency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_lookup_cached&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ioc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_cache&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(...)):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_sem&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lookup&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The semaphore is &lt;strong&gt;shared across all sources and all IOCs&lt;/strong&gt;. So when the analyst feeds in 100 IOCs, the engine doesn't slam every source with 100 simultaneous requests — it pipelines them through the cap.&lt;/p&gt;

&lt;p&gt;The free tiers of these APIs have rate limits (VirusTotal: 4 req/minute on free). Without the cap I'd hit 429s instantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things That Bit Me
&lt;/h2&gt;

&lt;h3&gt;
  
  
  URLhaus and ThreatFox now require auth
&lt;/h3&gt;

&lt;p&gt;Until mid-2024 they were truly keyless. The abuse.ch team added &lt;code&gt;Auth-Key&lt;/code&gt; requirement to fight scraper abuse. The key is free and registration is instant, but my "everything-keyless" pitch had to become "Tor-keyless, everything else free signup."&lt;/p&gt;

&lt;p&gt;This is fine, but it taught me to &lt;strong&gt;always link to the registration URL from the error message&lt;/strong&gt; when a source short-circuits. Don't make the user dig.&lt;/p&gt;

&lt;h3&gt;
  
  
  VirusTotal URL IDs are not URLs
&lt;/h3&gt;

&lt;p&gt;VT's v3 API expects URLs as &lt;code&gt;urlsafe-base64(url)&lt;/code&gt; with padding stripped. I lost an hour to this before reading their docs carefully:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_vt_url_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlsafe_b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;rstrip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rich's markup parser eats &lt;code&gt;[@]&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;I render defanged values in the CLI: &lt;code&gt;bad@evil.com&lt;/code&gt; → &lt;code&gt;bad[@]evil[.]com&lt;/code&gt;. Rich's table renderer interpreted &lt;code&gt;[@]&lt;/code&gt; as a (nonexistent) markup tag and silently stripped it. Output became &lt;code&gt;badevil[.]com&lt;/code&gt; — completely broken.&lt;/p&gt;

&lt;p&gt;The fix is &lt;code&gt;rich.markup.escape()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_safe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rich&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;markup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;defang&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I now wrap every IOC value in &lt;code&gt;_safe()&lt;/code&gt; before passing to a Rich component. Tests caught this only after I started writing the README — the tests verified the verdict, not the rendered string.&lt;/p&gt;

&lt;h3&gt;
  
  
  STIX 2.1 patterns need apostrophe-escaping
&lt;/h3&gt;

&lt;p&gt;A domain IOC with an apostrophe (&lt;code&gt;it's.example.com&lt;/code&gt; — weird but possible) breaks the STIX pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[domain-name:value = 'it's.example.com']  ← invalid
[domain-name:value = 'it\'s.example.com'] ← valid
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pattern values are single-quoted in STIX, so embedded apostrophes need escaping. Took a tracked-down-on-purpose test to catch it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Boring Parts That Matter
&lt;/h2&gt;

&lt;p&gt;If you read GitHub-shaped engineering posts, the "boring parts" — tests, CI, lint, secret scanning, Docker hygiene — get one sentence at the end. They probably deserve half the post.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;217 unit tests.&lt;/strong&gt; Every regex pattern, every source, every exporter, every scorer threshold has a test. Network is mocked via &lt;code&gt;respx&lt;/code&gt;. The test suite runs in 0.7 seconds. I can refactor anything and know within a second if I broke something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI matrix.&lt;/strong&gt; Tests run on Python 3.11 and 3.12. Ruff lints and format-checks. Docker image builds. Gitleaks scans the diff for accidentally-committed secrets. Every PR has to pass all of this before merging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-stage Docker.&lt;/strong&gt; The runtime image is non-root, ~120 MB, doesn't include test files or the wheel-builder layer. The cache directory is a mounted volume so it survives container restarts.&lt;/p&gt;

&lt;p&gt;None of this is impressive on its own. It's the table stakes that separates "code I'd hire someone for" from "code I'd ask them to explain in an interview."&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned About Myself
&lt;/h2&gt;

&lt;p&gt;I started this thinking "I'll learn asyncio." I finished thinking "asyncio was the easy part — the hard part was deciding what &lt;em&gt;not&lt;/em&gt; to build."&lt;/p&gt;

&lt;p&gt;Half the work was saying no:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No PyYAML for Sigma generation. Hand-write the YAML, save a dependency.&lt;/li&gt;
&lt;li&gt;No SQLAlchemy for the cache. Stdlib &lt;code&gt;sqlite3&lt;/code&gt; is enough.&lt;/li&gt;
&lt;li&gt;No "agent framework" for plugin sources. An ABC and a list is enough.&lt;/li&gt;
&lt;li&gt;No background daemon. A CLI is enough.&lt;/li&gt;
&lt;li&gt;No web UI. The Rich TUI is enough.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every "is enough" is a thing I didn't have to test, document, maintain, or explain to a hiring manager. The project is &lt;strong&gt;6,000 lines of code and 4 runtime dependencies&lt;/strong&gt; because of that discipline.&lt;/p&gt;

&lt;p&gt;I think this is the real seniority signal. Anyone can add a dep. Not everyone can leave one out.&lt;/p&gt;




&lt;h2&gt;
  
  
  If You Want to Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/platinum2high/ioc-hunter
&lt;span class="nb"&gt;cd &lt;/span&gt;ioc-hunter
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

ioc-hunter check &lt;span class="s2"&gt;"185[.]220[.]101[.]42"&lt;/span&gt;   &lt;span class="c"&gt;# works keyless&lt;/span&gt;
ioc-hunter configure                       &lt;span class="c"&gt;# walks through optional API keys&lt;/span&gt;
ioc-hunter scan-file examples/sample-incident.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
docker compose run &lt;span class="nt"&gt;--rm&lt;/span&gt; ioc-hunter check evil[.]com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo is MIT, the issue tracker is open, and I'd genuinely love feedback from SOC analysts on the scoring model, defang patterns, and sources I should add. (I'm thinking abuse.ch MalwareBazaar and GreyNoise next.)&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Code: &lt;a href="https://github.com/platinum2high/ioc-hunter" rel="noopener noreferrer"&gt;github.com/platinum2high/ioc-hunter&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Reach me on &lt;a href="https://www.linkedin.com/in/shymko-artem-39ba502a7/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; if you want to chat about SOC tooling, threat intel, or detection engineering.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>cybersecurity</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
