<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 404Saint</title>
    <description>The latest articles on DEV Community by 404Saint (@null_saint).</description>
    <link>https://dev.to/null_saint</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3798991%2Ffb16aced-10b3-480d-97d0-fbfd213a8c44.jpg</url>
      <title>DEV Community: 404Saint</title>
      <link>https://dev.to/null_saint</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/null_saint"/>
    <language>en</language>
    <item>
      <title>Recon Methodology in Practice: From a Single Credential to Full Schema Reconstruction</title>
      <dc:creator>404Saint</dc:creator>
      <pubDate>Sun, 03 May 2026 00:43:26 +0000</pubDate>
      <link>https://dev.to/null_saint/recon-methodology-in-practice-from-a-single-credential-to-full-schema-reconstruction-27b7</link>
      <guid>https://dev.to/null_saint/recon-methodology-in-practice-from-a-single-credential-to-full-schema-reconstruction-27b7</guid>
      <description>&lt;p&gt;&lt;em&gt;By RUGERO Tesla (&lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The methodology matters more than the target
&lt;/h2&gt;

&lt;p&gt;Most recon write-ups focus on the finding. This one focuses on the process.&lt;/p&gt;

&lt;p&gt;The target here is a Supabase project I own. Controlled lab, no real user data. I gave myself only what an attacker would realistically have: the project URL and the anon key sitting in the frontend bundle. No dashboard access. No schema knowledge. No tools beyond curl and a small Python script.&lt;/p&gt;

&lt;p&gt;The goal wasn't to find a vulnerability. It was to document what passive enumeration and error-based inference actually look like when you execute them methodically, step by step. The same reasoning drives this walkthrough as drives my ICS/OT reconnaissance work: observe first, infer from behavior, reconstruct what you can't see directly, never touch what you don't have to.&lt;/p&gt;

&lt;p&gt;The target is different. The methodology is the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 0: What you start with
&lt;/h2&gt;

&lt;p&gt;Every Supabase project exposes two things in the frontend by default: the project URL and the anon key. The anon key is a JWT. Before making a single network request, decoding it already tells you something:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"iss"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"supabase"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ref"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;project-ref&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"iat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1771624280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2087200280&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two observations worth making before you do anything else. The role is &lt;code&gt;anon&lt;/code&gt;, which means this key authenticates as the anonymous PostgreSQL role and inherits whatever permissions the developer explicitly granted it. And the expiry is ten years out. If this key appears in a public repository or gets scraped from a frontend bundle, an attacker has a decade of access with no forced rotation.&lt;/p&gt;

&lt;p&gt;Passive intelligence gathering before active enumeration. Know what you're working with.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Try the obvious path first
&lt;/h2&gt;

&lt;p&gt;The first probe is always the most direct one. PostgREST exposes an OpenAPI endpoint that would hand you the entire schema immediately if it responds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://&amp;lt;project&amp;gt;.supabase.co/rest/v1/"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: &amp;lt;anon_key&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response: &lt;code&gt;{"message":"Invalid API key","hint":"Only the service_role API key can be used for this endpoint."}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Locked. The obvious path is closed.&lt;/p&gt;

&lt;p&gt;This is where a lot of recon stops. It shouldn't. A failed probe isn't a dead end, it's information. You now know that schema discovery via OpenAPI requires elevated credentials, which means the developer at least configured that part correctly. It raises the bar from immediate to wordlist-dependent. That's a meaningful distinction, not a wall.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Wordlist enumeration and what response codes tell you
&lt;/h2&gt;

&lt;p&gt;With no schema available directly, you fall back to inferring structure through behavior. Common table names, systematic probing, reading the response codes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;table &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="nb"&gt;users &lt;/span&gt;profiles accounts orders assignments messages disputes notifications user_roles&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;STATUS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s2"&gt;"https://&amp;lt;project&amp;gt;.supabase.co/rest/v1/&lt;/span&gt;&lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="s2"&gt;?select=*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: &amp;lt;anon_key&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;anon_key&amp;gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="s2"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;$STATUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response codes are the signal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;200&lt;/code&gt; means the table exists, it's accessible, and nothing is blocking you&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;403&lt;/code&gt; means the table exists but something is blocking you&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;404&lt;/code&gt; means the table doesn't exist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results from my project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;profiles       -&amp;gt; 200
user_roles     -&amp;gt; 200
assignments    -&amp;gt; 200
messages       -&amp;gt; 200
disputes       -&amp;gt; 200
notifications  -&amp;gt; 200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six tables. All accessible. This isn't because I disabled access controls. It's because I never enabled them. That distinction matters and I'll come back to it.&lt;/p&gt;

&lt;p&gt;The pattern here is worth internalizing. You're not looking for a vulnerability in the traditional sense. You're observing how the system responds to different inputs and reading what those responses imply about underlying structure. This is the same logic that drives behavioral fingerprinting in MEA: real devices and simulated ones respond differently under observation, and those differences tell you things you couldn't get by asking directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Schema reconstruction through error-based inference
&lt;/h2&gt;

&lt;p&gt;The OpenAPI spec is locked. But PostgREST's error messages are not, and that asymmetry is exploitable.&lt;/p&gt;

&lt;p&gt;POSTing a request that references a nonexistent column returns &lt;code&gt;PGRST204&lt;/code&gt;. POSTing with a real column returns something different: a constraint error, a type mismatch, a permission failure. The distinction leaks column existence without requiring any elevated access.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;col &lt;span class="k"&gt;in &lt;/span&gt;&lt;span class="nb"&gt;id &lt;/span&gt;user_id email nickname university department level banned created_at&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;RESP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;".../rest/v1/profiles"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: &amp;lt;key&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$col&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;probe&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$col&lt;/span&gt;&lt;span class="s2"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="nv"&gt;$RESP&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Confirmed columns in &lt;code&gt;profiles&lt;/code&gt;: &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;nickname&lt;/code&gt;, &lt;code&gt;university&lt;/code&gt;, &lt;code&gt;department&lt;/code&gt;, &lt;code&gt;level&lt;/code&gt;, &lt;code&gt;created_at&lt;/code&gt;, &lt;code&gt;updated_at&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Not found: &lt;code&gt;email&lt;/code&gt;, &lt;code&gt;banned&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Full schema reconstruction. No OpenAPI access. No elevated credentials. Just systematic probing and reading what the error responses imply.&lt;/p&gt;

&lt;p&gt;This is error-based inference, and it appears across disciplines. In network recon, you read ICMP responses to infer firewall rules. In ICS environments, you observe register behavior to distinguish real devices from simulators. The underlying pattern is always the same: systems communicate their internal state through their responses, even when they're trying not to.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Confirming access with a direct read
&lt;/h2&gt;

&lt;p&gt;With table names and column structure mapped, the final step is confirming what's actually readable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;".../rest/v1/assignments?select=*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"apikey: &amp;lt;key&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;key&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0155e342-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"student_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"74aae5f9-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"design"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chem"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deadline"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-09T06:30:00+00:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"budget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2500.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sla_tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"payment_status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"none"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"escrow_status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"none"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a production environment with real users that's financial data, user identifiers, status information, all readable by anyone with a frontend key that's exposed by design.&lt;/p&gt;

&lt;p&gt;Total time from zero knowledge to reading data: under ten minutes. One credential. A wordlist of ten common table names. Standard curl.&lt;/p&gt;




&lt;h2&gt;
  
  
  The methodology, extracted
&lt;/h2&gt;

&lt;p&gt;The four-step pattern here generalizes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start passive.&lt;/strong&gt; Decode what you already have before sending a single packet. The JWT alone told me the role, the project reference, and the key lifetime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try the direct path first.&lt;/strong&gt; The OpenAPI endpoint would have given everything immediately. It failed, but the failure was informative. Never skip the obvious probe: if it works you're done early, if it fails you know something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infer from behavior when direct access fails.&lt;/strong&gt; Response codes, error messages, timing differences. Systems leak information about their internal state constantly. Read it systematically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reconstruct before you read.&lt;/strong&gt; Map the structure first, then confirm access. Going straight to data reads without understanding the schema means you'll miss things and make noise you didn't need to make.&lt;/p&gt;

&lt;p&gt;This is the same sequence whether the target is a web API, a network perimeter, or an industrial protocol implementation. The tools change. The thinking doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Supabase-specific finding
&lt;/h2&gt;

&lt;p&gt;For anyone building on Supabase: Row Level Security is not enabled by default. Every table you create is immediately readable by the anon role through the PostgREST API until you explicitly enable RLS and write policies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="nv"&gt;"users can view own profile"&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, your anon key lives in your frontend bundle, is always public, and acts as a read key for your entire database. Enable RLS before you write application logic, not after.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Conducted against a project I own. No real user data involved. The record in &lt;code&gt;assignments&lt;/code&gt; was seeded during development.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;All my projects: &lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;github.com/404saint&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;strong&gt;RUGERO Tesla&lt;/strong&gt; · GitHub: &lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Offensive security researcher focused on ICS/OT, infrastructure security, and attack surface analysis.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Built a Tool That Detects SEO Poisoning Across Multiple Search Engines</title>
      <dc:creator>404Saint</dc:creator>
      <pubDate>Sat, 02 May 2026 23:58:53 +0000</pubDate>
      <link>https://dev.to/null_saint/i-built-a-tool-that-detects-seo-poisoning-across-multiple-search-engines-15n9</link>
      <guid>https://dev.to/null_saint/i-built-a-tool-that-detects-seo-poisoning-across-multiple-search-engines-15n9</guid>
      <description>&lt;p&gt;&lt;em&gt;By RUGERO Tesla (&lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;).&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  It started with an article I couldn't stop thinking about
&lt;/h2&gt;

&lt;p&gt;A few months back I read about how attackers were poisoning search results to push malicious software downloads. The attack isn't sophisticated. You register a convincing-looking domain, keyword-stuff it correctly, buy or manipulate your way into the top results, and wait. Someone searches "Siemens TIA Portal V17 download", clicks the third result, and downloads a trojanised installer.&lt;/p&gt;

&lt;p&gt;What got me wasn't that it worked. It was &lt;em&gt;how&lt;/em&gt; it worked. People trust search results. Not because they've verified them. Just because they're there.&lt;/p&gt;

&lt;p&gt;And the thing is, most people only check one search engine.&lt;/p&gt;

&lt;p&gt;That thought wouldn't leave me alone. If an attacker has to poison Google AND Bing AND Brave AND DuckDuckGo simultaneously for the same query at comparable rank positions... that's a much harder problem. Cross-referencing results across engines should make poisoned results stick out.&lt;/p&gt;

&lt;p&gt;So one slow weekend I started building something. I called it &lt;strong&gt;Arkoi&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The question I wanted to answer
&lt;/h2&gt;

&lt;p&gt;Every URL scanner I know of asks: &lt;em&gt;is this URL dangerous?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I wanted to ask something different: &lt;em&gt;given that I searched for X, does this result actually belong here?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That sounds subtle but it changes a lot. A two-year-old domain with a clean URLhaus record can still be a poisoned result if it's ranking #2 on Google for a specific enterprise software query while being completely absent everywhere else. The domain isn't inherently dangerous. It's contextually wrong. That's the signal.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it actually works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Parsing the query first
&lt;/h3&gt;

&lt;p&gt;Before fetching anything, Arkoi tries to understand what you're actually looking for. It pulls out the vendor, the software name, and the version from raw text.&lt;/p&gt;

&lt;p&gt;So "Siemens TIA Portal V17 download" becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;vendor  &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;siemens&lt;/span&gt;
&lt;span class="na"&gt;version &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;V17&lt;/span&gt;
&lt;span class="na"&gt;tokens  &lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;siemens'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tia'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;portal'&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;v17'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also handles product aliases. Search for "autocad" and it maps to Autodesk's vendor profile. "matlab" maps to MathWorks. "pycharm" maps to JetBrains. You don't need to know who makes what.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fetching six engines at once
&lt;/h3&gt;

&lt;p&gt;All six engines (Google, Bing, Brave, DuckDuckGo, Yahoo, Yandex) get queried in parallel through a self-hosted SearXNG instance. Results come back merged and deduplicated by domain, with each result carrying a record of which engines returned it and at what rank.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;_fetch_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;eng&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ENGINES&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;results_per_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;responded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results_per_engine&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;engine_results&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results_per_engine&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;engine_results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_merge_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The number of engines that actually responded matters because it's the denominator for consensus scoring. If only three engines respond, a result appearing on two of them is medium consensus, not low.&lt;/p&gt;

&lt;h3&gt;
  
  
  Six signal checks per result, all concurrent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Vendor domain verification.&lt;/strong&gt; Does this domain actually belong to the vendor you searched for? There are four possible outcomes: &lt;code&gt;VENDOR_MATCH&lt;/code&gt; (it's them), &lt;code&gt;TRUSTED_PARTNER&lt;/code&gt; (it's a safe subdomain or official partner), &lt;code&gt;VENDOR_IMPOSTER&lt;/code&gt; (the domain contains the vendor name but isn't theirs, like &lt;code&gt;siemens-downloads.net&lt;/code&gt;), and &lt;code&gt;UNRELATED&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The imposter case is the most dangerous one and the easiest to catch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-engine consensus.&lt;/strong&gt; What share of responding engines returned this domain? 60% or above is high consensus. Below 33% is low. A result that only shows up on one engine for a well-known software query is already worth questioning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rank anomaly.&lt;/strong&gt; Is an unrelated domain sitting in the top 3? Is the official vendor domain buried past position 5 while other domains outrank it? Either pattern is a flag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query-result relevance.&lt;/strong&gt; Token overlap, keyword stuffing detection, and URL path analysis. If the path contains things like &lt;code&gt;/full-version/&lt;/code&gt;, &lt;code&gt;/googledrive/&lt;/code&gt;, &lt;code&gt;/crack/&lt;/code&gt;, that's a direct signal. Known platforms like YouTube and Reddit are excluded from the stuffing check because their titles naturally repeat search terms and that's just how they work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;URLhaus lookup.&lt;/strong&gt; Async check against the abuse.ch database. If the domain is a known malware host, that surfaces immediately regardless of everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain age.&lt;/strong&gt; WHOIS with a hard 6-second timeout. The timeout matters because without it, stalled WHOIS connections hold up the entire pipeline. Only domains under 180 days get flagged. Older domains get no age penalty regardless of anything else.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verdicts, not scores
&lt;/h3&gt;

&lt;p&gt;This is the part I'm most opinionated about. No percentage scores. Four categories with explicit reasons:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;✓ TRUSTED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Official vendor or trusted partner, consistent across engines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;? UNVERIFIED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No red flags, but no vendor relationship confirmed either&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;⚠ SUSPICIOUS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Something's off. New domain, rank anomaly, suspicious path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;✗ DECEPTIVE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Clear indicators of deceptive placement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;UNVERIFIED&lt;/code&gt; state was the most important one to get right. An earlier version showed anything without red flags as green. That's not safe, that's just uninspected. "We found nothing wrong" and "this is safe" are different things.&lt;/p&gt;




&lt;h2&gt;
  
  
  The stuff I got wrong
&lt;/h2&gt;

&lt;p&gt;The first version had numeric percentage scores, SSL certificate issuer checking, and keyword scoring. All three were mistakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Percentage scores sounded precise but weren't.&lt;/strong&gt; Where does 67% come from? Arbitrary thresholds added together. Replacing scores with categorical verdicts plus explicit reasoning is more honest and actually more useful because you can see &lt;em&gt;why&lt;/em&gt; something got flagged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSL issuer checking was noise.&lt;/strong&gt; In 2025, penalising a domain for using Let's Encrypt tells you it's cost-conscious, not malicious. Millions of legitimate sites use DV certs. Dropped entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keyword scoring fired too broadly.&lt;/strong&gt; "Free download" catches CNET. "Full version" catches vendor trial pages. The signal-to-noise ratio was terrible. Replaced with vendor domain mismatch detection and URL path analysis, which are actually precise.&lt;/p&gt;

&lt;p&gt;The biggest practical problem was speed. Everything ran sequentially in the first version. Twelve results times three slow network checks each meant runs taking close to two minutes. Rewriting with asyncio and running all per-result checks concurrently got this to around 9 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why SearXNG
&lt;/h2&gt;

&lt;p&gt;Arkoi requires a self-hosted &lt;a href="https://docs.searxng.org/" rel="noopener noreferrer"&gt;SearXNG&lt;/a&gt; instance. That's a real dependency and worth explaining.&lt;/p&gt;

&lt;p&gt;Scraping search engines directly is legally grey and technically fragile. Official APIs are rate-limited, paid, and different for every engine. SearXNG handles all of this cleanly. One local endpoint, six engines, no API keys, privacy-preserving by default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 searxng/searxng
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The downside is that not all SearXNG configs have all engines enabled out of the box. In my testing only 3 of 6 engines consistently responded. The consensus logic adapts to however many engines actually returned results so it degrades gracefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it still falls short
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;WHOIS age is less useful than I hoped.&lt;/strong&gt; Privacy protection and rate limiting mean most domains come back as &lt;code&gt;UNKNOWN&lt;/code&gt; rather than an actual age. Age works as a supporting signal when it's available but you can't rely on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Yandex skews rank anomaly detection.&lt;/strong&gt; Yandex's ordering for Western software queries is genuinely different from other engines. A YouTube tutorial ranked #1 by Yandex isn't poisoning, it's just Yandex. The rank anomaly check needs engine-aware weighting to handle this properly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No vendor match means less precision.&lt;/strong&gt; If your query doesn't hit any of the 50+ vendor profiles, vendor verification gets skipped and you're left with consensus and anomaly scoring only. Still useful, but clearly a step down.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/404saint/arkoi.git
&lt;span class="nb"&gt;cd &lt;/span&gt;arkoi
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Start SearXNG&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 searxng/searxng

&lt;span class="c"&gt;# Run it&lt;/span&gt;
python arkoi.py &lt;span class="s2"&gt;"AutoCAD 2025 download"&lt;/span&gt;
python arkoi.py &lt;span class="s2"&gt;"Wireshark install"&lt;/span&gt;
python arkoi.py &lt;span class="s2"&gt;"Adobe Photoshop free download"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tagged &lt;code&gt;v0.1.0-alpha&lt;/code&gt;. Pre-release, not production ready. Known issues are in the GitHub tracker. The README and CONTRIBUTING docs cover everything you'd need to add a vendor or pick up an open issue.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⭐ GitHub
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/404saint/arkoi" rel="noopener noreferrer"&gt;github.com/404saint/arkoi&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If this was useful or interesting, a star helps other people find it. Contributions welcome, especially vendor registry additions and the missing test suite. Open a PR and the CONTRIBUTING guide will walk you through it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;strong&gt;RUGERO Tesla&lt;/strong&gt; · GitHub: &lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Started as a bored weekend experiment. Turned out to be a more interesting problem than I expected.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>python</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
    <item>
      <title>Securing the Air-Gap: Building a Hardware-Aware Forensic Suite for ICS/OT</title>
      <dc:creator>404Saint</dc:creator>
      <pubDate>Mon, 13 Apr 2026 18:58:04 +0000</pubDate>
      <link>https://dev.to/null_saint/securing-the-air-gap-building-a-hardware-aware-forensic-suite-for-icsot-by-rugero-tesla-404saint-127o</link>
      <guid>https://dev.to/null_saint/securing-the-air-gap-building-a-hardware-aware-forensic-suite-for-icsot-by-rugero-tesla-404saint-127o</guid>
      <description>&lt;p&gt;&lt;em&gt;By RUGERO Tesla (&lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The air-gap is a lie
&lt;/h2&gt;

&lt;p&gt;Every ICS engineer will tell you their critical systems are air-gapped. Isolated. Untouchable.&lt;/p&gt;

&lt;p&gt;Then you watch someone walk up with a USB drive.&lt;/p&gt;

&lt;p&gt;The air-gap was never a technical guarantee. It was a policy. And policies fail the moment someone needs to transfer a firmware update, a vendor installer, or last week's historian backup onto a machine that "can't" touch the internet. Removable media is the bridge that's always there, always trusted, and almost never inspected properly.&lt;/p&gt;

&lt;p&gt;Stuxnet didn't compromise Iranian centrifuges through a network intrusion. It rode in on a USB drive. That was 2010. The vector hasn't changed.&lt;/p&gt;

&lt;p&gt;Standard antivirus doesn't help much here either. It's built for IT environments. It doesn't know what Modbus looks like, or why a legitimate-looking Siemens installer with suspiciously high entropy should be treated differently than a clean one. It scans for known signatures and moves on. In ICS/OT, what you're looking for is often subtler than that.&lt;/p&gt;

&lt;p&gt;So I built something for this specific problem. I called it &lt;strong&gt;Guardian-OT&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it actually does
&lt;/h2&gt;

&lt;p&gt;Guardian-OT is a forensic audit tool for removable media before it touches a critical engineering workstation. Not a full-blown enterprise platform. A focused, high-signal tool that tells you what's actually on a drive and whether it matches what's supposed to be there.&lt;/p&gt;

&lt;p&gt;It runs four checks, and each one is doing something different.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware fingerprinting
&lt;/h3&gt;

&lt;p&gt;The first thing Guardian-OT does is ignore the filesystem entirely and go straight to the hardware. It extracts the USB hardware UUID and checks it against a local SQLite vault of known, approved devices.&lt;/p&gt;

&lt;p&gt;This matters because USB spoofing is real. You can make a drive present itself as something it isn't at the filesystem level. Hardware UUID is harder to fake. If the ID is unknown, or if it doesn't match what the vault expects for that device, the audit flags it before a single file gets scanned.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recursive integrity verification
&lt;/h3&gt;

&lt;p&gt;Every file on an approved drive gets tree-hashed and stored during the first "known-good" scan. Every subsequent scan compares against that baseline.&lt;/p&gt;

&lt;p&gt;If anything has changed since the last clean scan, even one file, it triggers a full deep audit. Not a warning. A full forensic pipeline. The assumption is that in an ICS environment, unexpected changes to a trusted drive are not a routine event.&lt;/p&gt;

&lt;h3&gt;
  
  
  The forensic pipeline itself
&lt;/h3&gt;

&lt;p&gt;Three things run here in sequence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;YARA scanning&lt;/strong&gt; hunts for ICS-specific strings — Modbus, S7Comm, Ethernet/IP function codes, things that have no business being in a standard office document or a routine software update. If those strings show up somewhere unexpected, that's worth knowing about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entropy analysis&lt;/strong&gt; scores every file between 0.0 and 8.0. Anything above 7.8 gets isolated for manual review. Encrypted payloads and packed executables both score high. So does compressed data. The score alone doesn't condemn a file but it tells you where to look first when you only have time to look at ten things out of a thousand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Magic number validation&lt;/strong&gt; checks whether a file's actual header matches its extension. Hiding a script inside a file renamed to look like a PDF is a trivially simple technique that still works surprisingly often. This catches it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The researcher dashboard
&lt;/h3&gt;

&lt;p&gt;Raw JSON forensic output is useful for pipelines. It's not useful for a human who needs to triage a drive in the field.&lt;/p&gt;

&lt;p&gt;I added a Streamlit dashboard that takes that output and turns it into something you can actually act on. The goal is fast separation: out of 1,000+ assets on a typical drive, you want to get to the 10-20 things that actually need eyes-on review without wading through everything else manually.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I'm building this
&lt;/h2&gt;

&lt;p&gt;I'm four to six years into a long-term roadmap toward becoming a full-time ICS/OT security researcher. For most of that time I've been learning how to use tools other people built. Guardian-OT is the point where I started building my own.&lt;/p&gt;

&lt;p&gt;That shift matters to me. Understanding how a forensic tool works at the implementation level is different from knowing how to run it. You find the edge cases. You understand why certain signals are meaningful and others aren't. You build intuition that doesn't come from reading documentation.&lt;/p&gt;

&lt;p&gt;Guardian-OT is the first step in a forensic workflow I want to make resilient and reproducible for industrial environments. There's more coming.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/404saint/guardian-ot" rel="noopener noreferrer"&gt;github.com/404saint/guardian-ot&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you work in OT security or you're on a similar path, I'd like to hear what you think. Issues and PRs are open.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;strong&gt;RUGERO Tesla&lt;/strong&gt; · GitHub: &lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ot</category>
      <category>ics</category>
      <category>forensics</category>
    </item>
    <item>
      <title>SurfaceLens V2: Infrastructure Attack Surface and Shadow IT Intelligence Engine</title>
      <dc:creator>404Saint</dc:creator>
      <pubDate>Sat, 11 Apr 2026 14:07:49 +0000</pubDate>
      <link>https://dev.to/null_saint/i-built-a-modular-attack-surface-intelligence-engine-to-track-shadow-it-heres-what-i-learned-48a</link>
      <guid>https://dev.to/null_saint/i-built-a-modular-attack-surface-intelligence-engine-to-track-shadow-it-heres-what-i-learned-48a</guid>
      <description>&lt;p&gt;&lt;em&gt;By RUGERO Tesla (&lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing nobody wants to admit
&lt;/h2&gt;

&lt;p&gt;Most organizations don't actually know what they're exposing to the internet.&lt;/p&gt;

&lt;p&gt;I don't mean that as a criticism. I mean it literally. Assets drift. Services get spun up and forgotten. Teams build things outside the controlled network boundary because it's faster. A subdomain that pointed somewhere important three years ago still resolves, except now it points at nothing, and nothing is claimable by anyone with the right timing.&lt;/p&gt;

&lt;p&gt;This is what Shadow IT looks like from the outside. Not malicious. Just invisible.&lt;/p&gt;

&lt;p&gt;I spent a lot of time doing recon simulations and building lab environments around infrastructure security, and the same problem kept showing up. Discovery is a solved problem. You can find assets. What's hard is understanding how they relate to each other, which ones actually belong to the organization you're looking at, and which ones represent real exposure versus expected noise.&lt;/p&gt;

&lt;p&gt;SurfaceLens V2 is my attempt to build something that treats those questions seriously.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it is
&lt;/h2&gt;

&lt;p&gt;SurfaceLens V2 is a modular attack surface management tool, but calling it a scanner misses the point. It's built as an intelligence pipeline. The difference matters.&lt;/p&gt;

&lt;p&gt;A scanner gives you a list. A pipeline takes that list and asks what it means. Who does this asset belong to? Has it appeared before? Does its TLS configuration match what you'd expect? Is this subdomain pointing at infrastructure that's been decommissioned?&lt;/p&gt;

&lt;p&gt;The goal is moving from raw discovery to something you can actually act on.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I kept running into
&lt;/h2&gt;

&lt;p&gt;Doing recon across different lab environments and simulated enterprise networks, four things came up constantly.&lt;/p&gt;

&lt;p&gt;Subdomains pointing at decommissioned infrastructure nobody had cleaned up. In some cases the underlying cloud resource was unclaimed, meaning anyone could register it and inherit whatever trust the subdomain carried. Subdomain takeover is well documented but it's still everywhere.&lt;/p&gt;

&lt;p&gt;Services exposed outside their intended boundaries. RDP and SSH sitting on public IPs. Databases reachable without a VPN. Not because anyone decided that was fine, just because nobody noticed.&lt;/p&gt;

&lt;p&gt;Assets that clearly belonged to an organization but didn't match its DNS patterns at all. Shadow IT, basically. Someone built something, it works, it lives outside the perimeter anyone is actually monitoring.&lt;/p&gt;

&lt;p&gt;TLS configurations that ranged from outdated to outright broken, on infrastructure that looked authoritative enough that a user would trust it without thinking.&lt;/p&gt;

&lt;p&gt;None of these are surprising individually. Together they paint a picture of an attack surface nobody has a complete map of.&lt;/p&gt;




&lt;h2&gt;
  
  
  How SurfaceLens approaches it
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pull from multiple sources
&lt;/h3&gt;

&lt;p&gt;The first stage aggregates asset data from Shodan, Censys, LeakIX, CriminalIP, and local datasets. Using multiple providers matters because each one sees different things. An asset invisible to Shodan might be indexed by Censys. Combining sources gives you a more complete picture than any single feed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Track state over time
&lt;/h3&gt;

&lt;p&gt;One of the decisions I spent the most time on was persistence. Most recon tools treat each scan as a standalone event. You run it, you get results, you move on.&lt;/p&gt;

&lt;p&gt;That model throws away something valuable. The question isn't just what's exposed right now. It's what's new since the last time you looked, what disappeared, what changed.&lt;/p&gt;

&lt;p&gt;SurfaceLens stores assets in a local SQLite database with first-seen and last-seen timestamps. New exposures surface immediately. An asset that vanished and came back shows up as a change worth investigating. Recon becomes monitoring instead of a one-time snapshot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run each asset through the pipeline
&lt;/h3&gt;

&lt;p&gt;Every asset that comes in goes through a series of modular checks.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;SSL Auditor&lt;/strong&gt; pulls certificate data and evaluates TLS configuration. Weak ciphers, expired certs, misconfigured chains. Anything that would make a security-conscious person wince.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;DNS Correlator&lt;/strong&gt; does attribution analysis. This is the part I find most interesting. It tries to determine whether an asset actually belongs to the organization you're analyzing, or whether it's drifted outside controlled boundaries. This is where Shadow IT becomes visible in the data rather than just suspected.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Fingerprinter&lt;/strong&gt; identifies technologies and service layers. What's running behind the asset? A reverse proxy? A specific web server version? This context changes how you interpret everything else.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Sensitive File Hunter&lt;/strong&gt; checks for common exposure patterns. &lt;code&gt;.env&lt;/code&gt; files, &lt;code&gt;robots.txt&lt;/code&gt; entries that reveal more than intended, backup files sitting in predictable locations. Simple checks that still catch real things regularly.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Risk Prioritizer&lt;/strong&gt; pulls all of this together into a weighted score between 0 and 10. Not a magic number that tells you what to do, but a signal that tells you where to look first when you have fifty assets and time for five.&lt;/p&gt;




&lt;h2&gt;
  
  
  The shift that changed how I think about this
&lt;/h2&gt;

&lt;p&gt;When I started building SurfaceLens I was thinking about discovery. Find the things, list the things, report the things.&lt;/p&gt;

&lt;p&gt;Somewhere in the middle of building the DNS Correlator I started thinking differently.&lt;/p&gt;

&lt;p&gt;Individual findings don't tell you much. An open port is an open port. A TLS misconfiguration is a TLS misconfiguration. But when you start correlating DNS attribution with service exposure with certificate data with historical visibility, you start seeing something that looks less like a list of issues and more like a map of how an attacker would move.&lt;/p&gt;

&lt;p&gt;That's where exposure stops being a checkbox and starts being an attack path.&lt;/p&gt;

&lt;p&gt;I don't think I fully understood that distinction until I had to implement it. Which is probably the best argument for building tools rather than just using them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Output
&lt;/h2&gt;

&lt;p&gt;The same underlying data comes out three ways depending on what you need.&lt;/p&gt;

&lt;p&gt;CLI output for quick assessments when you want high-signal results without overhead. Markdown reports for documentation and audit trails. A Flask web dashboard for anything that benefits from a persistent, navigable view of assets, risk scores, and historical changes.&lt;/p&gt;

&lt;p&gt;Same data model, different interfaces. Nothing gets lost between them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it isn't
&lt;/h2&gt;

&lt;p&gt;SurfaceLens is passive-first. It relies on aggregated intelligence sources and non-intrusive active checks. It's not an aggressive scanner. It's not trying to enumerate everything as fast as possible.&lt;/p&gt;

&lt;p&gt;That's a deliberate choice. In real environments, volume creates noise. Noise buries signal. The tool is more useful if it's telling you fewer, more meaningful things than if it's generating a report that takes three days to triage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it goes next
&lt;/h2&gt;

&lt;p&gt;SurfaceLens V2 is a foundation. The areas I'm actively thinking about are better attribution models for asset ownership, risk scoring that's more context-aware than weighted signals alone, and tighter integration with automated security workflows.&lt;/p&gt;

&lt;p&gt;The detection coverage for infrastructure misconfigurations has room to grow too. There's a long list of checks that would add value without adding noise, and working through that list is ongoing.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Use this responsibly.&lt;/strong&gt; SurfaceLens is built for defensive research and authorized assessments. Don't point it at infrastructure you don't have permission to analyze.&lt;/p&gt;




&lt;h2&gt;
  
  
  The project
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/404saint/surfacelens_v2" rel="noopener noreferrer"&gt;github.com/404saint/surfacelens_v2&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're working in infrastructure security or attack surface management, take a look. Issues and PRs are open. I'm especially interested in feedback from people who've tried to solve the attribution problem differently.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;strong&gt;RUGERO Tesla&lt;/strong&gt; · GitHub: &lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Offensive security researcher focused on infrastructure, network security, attack surface analysis, and Shadow IT discovery.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>cybersecurity</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>MEA: Modbus Exposure Analyzer — Passive ICS/OT Security Analysis</title>
      <dc:creator>404Saint</dc:creator>
      <pubDate>Sat, 28 Feb 2026 23:40:01 +0000</pubDate>
      <link>https://dev.to/null_saint/mea-modbus-exposure-analyzer-passive-icsot-security-analysis-by-rugero-tesla-404saint-3b4a</link>
      <guid>https://dev.to/null_saint/mea-modbus-exposure-analyzer-passive-icsot-security-analysis-by-rugero-tesla-404saint-3b4a</guid>
      <description>&lt;p&gt;&lt;em&gt;By RUGERO Tesla (&lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with Modbus being on the internet
&lt;/h2&gt;

&lt;p&gt;Modbus was designed in 1979. It was designed for closed, serial networks where the assumption was that if you could physically reach the device, you were supposed to be there. There was no authentication. No encryption. No concept of an untrusted caller.&lt;/p&gt;

&lt;p&gt;That assumption held for a long time. Then came Ethernet. Then came remote monitoring. Then came cloud connectivity and the slow, steady erosion of the air-gap that industrial engineers spent decades taking for granted.&lt;/p&gt;

&lt;p&gt;Today you can find Modbus devices on Shodan. Public IP addresses, port 502, responding to anyone who asks. Some of them are real PLCs in real facilities. Some are misconfigured. Some are honeypots. And telling those three apart without disrupting whatever process they're attached to is not as straightforward as it sounds.&lt;/p&gt;

&lt;p&gt;That's the problem MEA is built to solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  What MEA actually does
&lt;/h2&gt;

&lt;p&gt;MEA is a passive behavioral analysis tool for Modbus devices. Passive matters here more than it might in an IT context. In ICS/OT environments, sending unexpected traffic to a live device isn't just a network etiquette issue — it can interrupt physical processes. You don't probe a PLC controlling a pump the same way you'd run nmap against a web server.&lt;/p&gt;

&lt;p&gt;MEA works by observing. It reads register data, measures behavioral patterns over time, analyzes entropy, and monitors for changes. It doesn't write anything. It doesn't send commands. It gathers enough signal to tell you something meaningful about a device without touching its operation.&lt;/p&gt;

&lt;p&gt;The three things it's trying to answer are:&lt;/p&gt;

&lt;p&gt;Is this device real or simulated? Honeypots and simulators behave differently from genuine industrial hardware under sustained observation. Register values on real devices drift in ways that reflect actual physical processes. Simulated registers tend to be static, randomized, or artificially varied in patterns that don't match how real sensors behave.&lt;/p&gt;

&lt;p&gt;How exposed is it? What's reachable, what's responding, and does the exposure match what you'd expect from a device in this kind of environment?&lt;/p&gt;

&lt;p&gt;What's the actual risk? Not a generic vulnerability score, but something grounded in what the device is doing and what access to it would mean.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Behavioral fingerprinting
&lt;/h3&gt;

&lt;p&gt;The first thing MEA does when it connects to a device is start watching register values over multiple read cycles. Real industrial devices have a characteristic kind of noise. Temperature sensors drift. Flow meters fluctuate. A PLC running an active process shows register activity that reflects something happening in the physical world.&lt;/p&gt;

&lt;p&gt;Simulators don't replicate this well. They either hold values constant, cycle through obvious patterns, or randomize in ways that don't match the statistical profile of real sensor data. MEA measures this and uses it as a signal for the real-vs-simulated classification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Entropy analysis
&lt;/h3&gt;

&lt;p&gt;Each register read gets an entropy score. The goal is finding anomalies — registers behaving in ways that don't fit the surrounding context. An unusually high-entropy register on a device where everything else is low-entropy is worth investigating. It might be normal. It might not be.&lt;/p&gt;

&lt;p&gt;This is the same reasoning that drives entropy analysis in malware detection. Encrypted or packed data scores high because it's information-dense in a way that structured data usually isn't. The same principle applies to register data that doesn't match its neighbors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Register monitoring over time
&lt;/h3&gt;

&lt;p&gt;A single snapshot of a Modbus device tells you less than you'd think. MEA watches registers across multiple cycles and tracks changes. This catches things a one-time scan misses entirely — registers that only update under specific conditions, values that change in response to external events, patterns that only become visible when you're watching over minutes rather than seconds.&lt;/p&gt;

&lt;p&gt;It also catches something more subtle: devices that look normal at first glance but show anomalous behavior under sustained observation. That gap between the initial impression and the longer-term pattern is where a lot of the interesting findings live.&lt;/p&gt;

&lt;h3&gt;
  
  
  Risk assessment
&lt;/h3&gt;

&lt;p&gt;The risk output from MEA isn't a generic score plugged into a CVSS calculator. It's built from the combination of what the device is, how it's exposed, what its register behavior looks like, and what access to it would actually mean. A Modbus device responding on a public IP with registers that map to physical actuators is a different risk than the same device in a monitored DMZ with read-only exposure.&lt;/p&gt;

&lt;p&gt;Context matters in ICS security in ways it often doesn't in IT security, and the risk output is designed to reflect that.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;p&gt;Security researchers doing passive reconnaissance on ICS infrastructure. Pentesters working authorized assessments who need to gather intelligence without risking operational disruption. Blue teams trying to understand their own exposure before someone else does.&lt;/p&gt;

&lt;p&gt;The audit-ready report output is there for the third group especially. Finding something is half the work. Documenting it in a format that an operations team will actually read and act on is the other half.&lt;/p&gt;




&lt;h2&gt;
  
  
  A note on how to use this
&lt;/h2&gt;

&lt;p&gt;MEA is a tool for authorized security work. ICS and OT environments carry real-world consequences in a way that most IT environments don't. Using this against infrastructure you don't have permission to analyze isn't just legally problematic — it's potentially dangerous to people and processes on the other side of that connection.&lt;/p&gt;

&lt;p&gt;If you're doing research on public-facing devices via platforms like Shodan, understand what you're looking at before you connect to it. The passive-first design of MEA is deliberate, but passive still means connecting, and connecting to live industrial hardware uninvited is a line worth thinking carefully about before crossing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The project
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/404saint/mea" rel="noopener noreferrer"&gt;github.com/404saint/mea&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The full codebase, documentation, and usage examples are there. If you're working in ICS/OT security and you've approached the real-vs-simulated problem differently, I'd be interested in hearing about it.&lt;/p&gt;

&lt;p&gt;All my projects: &lt;strong&gt;&lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;github.com/404saint&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;strong&gt;RUGERO Tesla&lt;/strong&gt; · GitHub: &lt;a href="https://github.com/404saint" rel="noopener noreferrer"&gt;@404Saint&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Offensive security researcher focused on ICS/OT, infrastructure security, and attack surface analysis.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>security</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
