<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NOX Project</title>
    <description>The latest articles on DEV Community by NOX Project (@nox-framework).</description>
    <link>https://dev.to/nox-framework</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3867691%2Fd3c707b9-bd86-4a78-9d25-c0cca10e7bbf.png</url>
      <title>DEV Community: NOX Project</title>
      <link>https://dev.to/nox-framework</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nox-framework"/>
    <language>en</language>
    <item>
      <title>Why I built an async recursive engine for OSINT (and why I nearly went insane doing it)</title>
      <dc:creator>NOX Project</dc:creator>
      <pubDate>Wed, 08 Apr 2026 11:55:18 +0000</pubDate>
      <link>https://dev.to/nox-framework/why-i-built-an-async-recursive-engine-for-osint-and-why-i-nearly-went-insane-doing-it-3bn8</link>
      <guid>https://dev.to/nox-framework/why-i-built-an-async-recursive-engine-for-osint-and-why-i-nearly-went-insane-doing-it-3bn8</guid>
      <description>&lt;p&gt;A few months ago, I found myself stuck in a massive OSINT rabbit hole. The routine was always the same: find an email, check a breach database, find a handle, search that handle elsewhere, find a hash, try to crack it... and repeat for three hours.&lt;/p&gt;

&lt;p&gt;I realized I wasn't doing "investigation" anymore; I was just acting like a manual script. My brain was melting. So, I decided to automate that "avalanche effect" and ended up building NOX.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Core Idea: The "Avalanche" Effect&lt;/strong&gt;&lt;br&gt;
Most OSINT tools are linear. You give them a seed, they give you a list.&lt;/p&gt;

&lt;p&gt;I wanted NOX to be recursive. In the nox.py core, I implemented what I call Recursive Reinjection. Every time the engine discovers a new unique asset—a different email, a specific handle, or a high-fidelity hash—it doesn't just log it. It automatically re-injects it as a new search seed.&lt;/p&gt;

&lt;p&gt;It maps out the entire "identity blast radius" in seconds. But, as you can imagine, managing recursion depth so you don't accidentally try to map the entire internet starting from a handle like "admin" was... interesting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Asyncio?&lt;/strong&gt;&lt;br&gt;
Interrogating 120+ sources (APIs, scrapers, search dorks) synchronously is a suicide mission for performance. I went with Python/Asyncio because I needed to handle hundreds of concurrent requests without the overhead of a massive thread pool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The simplified logic behind the pivot
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;discovered_assets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;discovered_assets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;asset&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;explored_set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pivot_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The speed is great, but speed is useless if you get blocked by the first firewall you hit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The WAF War: JA3 and TLS Fingerprinting&lt;/strong&gt;&lt;br&gt;
This was the biggest "hidden" cost of the project. Modern WAFs like Cloudflare and Akamai are incredibly good at spotting standard Python libraries. Even if you rotate User-Agents, they’ll catch you during the TLS handshake (JA3 fingerprinting).&lt;/p&gt;

&lt;p&gt;To keep NOX alive, I had to implement randomized TLS signatures and custom headers to mimic real browser behavior. It’s a constant game of cat-and-mouse. Every time a major source moves their goalposts, I’m back in the code updating fingerprints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solving the "Signal vs. Noise" Problem&lt;/strong&gt;&lt;br&gt;
OSINT generates a lot of garbage. If a leak is from 2012, how much do I actually care about that password?&lt;/p&gt;

&lt;p&gt;I added a risk_score logic that weights results based on:&lt;/p&gt;

&lt;p&gt;Recency: How old is the leak?&lt;/p&gt;

&lt;p&gt;Uniqueness: A bcrypt hash or a unique email is a "high-fidelity bridge," while a common handle is just noise.&lt;/p&gt;

&lt;p&gt;Source Reliability: Not all databases are created equal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Open Source?&lt;/strong&gt;&lt;br&gt;
Honestly? Because maintaining 120+ scrapers alone is impossible. Sites change their DOM, APIs move to v2, and WAF rules evolve every week.&lt;/p&gt;

&lt;p&gt;I’m hoping the community can help me keep the "signatures" and scrapers updated while I focus on improving the relational graphing and the HVT (High Value Target) detection logic.&lt;/p&gt;

&lt;p&gt;If you’re into Red Teaming, Bug Bounty, or just like seeing how far you can push Asyncio, give it a spin. The code is probably a bit "cursed" in some places, but it’s saved me a ton of time on initial recon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check the repo here&lt;/strong&gt;: &lt;a href="https://github.com/nox-project/nox-framework" rel="noopener noreferrer"&gt;https://github.com/nox-project/nox-framework&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m curious—how are you guys handling the noise-to-signal ratio when you’re dealing with massive relational datasets like this? Do you prefer manual pruning or do you trust the automated scoring?&lt;/p&gt;

</description>
      <category>automation</category>
      <category>infosec</category>
      <category>python</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
