Simon Morley

Posted on Nov 5

Why Multi-Validator Hosts Break Traditional Security Scanning

#ai #vectordatabase #web3 #cybersecurity

Determining a host is running a Sui validator is easy.

Step 1 - scan a couple of ports:

Port 8080? Sui network endpoint.

Port 9184? Sui metrics.

Step 2 - Done. Next host.

And this is fine, but how do we really know it's a Sui validator? Also, humour me, we know it's a Sui validator, because there's a list of them.

But it turns out, these Sui validators also have http (80) open frequently. Which muddies the water. I don't know why, we're still working on that.

How do we find an Ethereum node. Same idea, different ports.

How do we find out if the host is running Sui and Ethereum?

It gets really messy, really fast. False positives, false negatives. General confusion. The humans have to intervene.

Traditional scanning starts with understanding. Once that's clear, the scanning can commence.

The Problem Nobody Talks About (mostly because they're not interested, but if they are, they're not talking about it).

Validator operators don't run one chain per host like some kind of theoretical best-practice diagram.

They run multiple validators on the same infrastructure because:

Hardware costs money
Operations complexity scales with host count
A 32-core server running one validator is wasteful
Most chains don't max out resources simultaneously

So you get hosts running:

Sui + Ethereum
Solana + Cosmos
Ethereum + Polygon + Arbitrum
Some combination I've literally never seen before

And now your nice clean rule-based scanner that looks for "Sui signatures" doesn't know what to do.

I spent weeks trying to figure it out, ended up:

Asking the user what services are running - lil bit 2002.
Using AI to figure it out!

Using AI did a great job, sometimes. At a cost. I walked away for a bit, did something else.

The Overlapping Port Problem

It gets worse when chains use similar port ranges or standard services.

Multiple validators might expose:

Metrics endpoints (Sui on 9184, standard Prometheus on 9090)
JSON-RPC endpoints (Ethereum 8545, Solana 8899, Sui 8080)
P2P networking (Ethereum 30303, Solana 8000-10000, Sui 8084)
WebSocket connections (Solana 8900)
Monitoring stacks (all using Grafana on 3000)

You can't just say "port 9090 = Prometheus therefore monitoring only."

Because what if that Prometheus instance is exposing metrics for three different validators?

Now you need to:

Identify which metrics belong to which chain
Understand which validators are actually running
Map CVEs to the correct services
Determine risk posture across multiple chains

Rule-based scanning doesn't scale to this.

The Configuration Variance Problem

Even if you nail down the ports, validator configurations vary wildly.

Some operators:

Run validators in Docker (different process visibility)
Use non-standard ports (8545 becomes 18545 because reasons)
Proxy everything through nginx (now all you see is nginx)
Run custom monitoring stacks (Prometheus? Grafana? Both? Neither?)
Use systemd service names that don't match upstream defaults

So your rules for "detecting an Ethereum validator" need to account for:

Standard Geth on 30303/8545
Dockerized Geth on custom ports
Proxied Geth behind nginx on 443
Custom compiled Geth with a weird banner
Besu instead of Geth (different client, same chain)
Nethermind or Erigon (different again)

And that's one chain.

Multiply this across Sui, Solana, Cosmos, Polygon, Avalanche...

You see the problem.

Manual Verification Works (But Doesn't Scale)

The only reliable way to identify multi-chain hosts right now?

Human verification.

This works. It's accurate.

It's also slower than your nan.

If you're scanning hundreds of validator hosts across multiple clients, you can't manually verify every configuration.

And if the configuration changes (which it does), you have to verify again.

The Insight That Changes Everything

After manually verifying enough multi-chain hosts, I started to notice something:

They have a shape.

Not a shape you can easily encode in rules.

But a shape you can recognise.

A Sui+ETH host "feels different" than a Sui-only host.

A Solana+Cosmos host has a different "fingerprint" than either chain alone.

You can't write down the rules for why.

But you know it when you see it.

Your brain is doing something that rule-based scanners can't:

Pattern matching across multiple dimensions simultaneously.

You're not looking for specific ports.

You're looking at the whole configuration and recognizing similarity.

Port 8080 + 8545 + 30303 + 9090 + 3000?

That's a Sui + Ethereum setup with monitoring.

Port 8899 + 8900 + 8000-8020 + 9090?

That's Solana with standard monitoring.

Port 8080 + 9184 + 8545 + 8551 + 30303 + 8899 + 9090 + 3000?

That's a three-chain monster that needs close attention.

You're not consciously running through these rules.

You're just seeing the pattern.

Goodbye AI, or at least some of it.

Instead of writing rules, we could:

Manually verify a multi-chain host once (or get the user to do this)!
Store its "fingerprint" (ports, services, banners, everything)
When we see a new host, search for similar fingerprints
If it's similar to a verified host, inherit that classification
If it's novel, verify manually and add to the training set

Ok, we're still using openai's embeddings but that's a lot cheaper than using openai's api.

This AI called this scaled pattern matching with human-verified training data.!! Weheey.

What Comes Next

In Part 2, I'll show you how vector embeddings let you do exactly this: turn a server's full configuration into a numerical fingerprint, then search for "servers that look like this one."

It's actually quite boring and if you know me, I love boring. Meanwhile, ChatGPT inserted this "it's just high-dimensional similarity search" which I found fun so I left it.

Spoiler: Postgres + pgvector is good enough! Most of us don't need the MEGA VECTOR DBS

This is Part 1 of 3 on building better security scanning for multi-chain validator infrastructure. Part 2 covers vector embeddings as scaled pattern matching.

Building something for good over here.