Determining a host is running a Sui validator is easy.
Step 1 - scan a couple of ports:
Port 8080? Sui network endpoint.
Port 9184? Sui metrics.
Step 2 - Done. Next host.
And this is fine, but how do we really know it's a Sui validator? Also, humour me, we know it's a Sui validator, because there's a list of them.
But it turns out, these Sui validators also have http (80) open frequently. Which muddies the water. I don't know why, we're still working on that.
How do we find an Ethereum node. Same idea, different ports.
How do we find out if the host is running Sui and Ethereum?
It gets really messy, really fast. False positives, false negatives. General confusion. The humans have to intervene.
Traditional scanning starts with understanding. Once that's clear, the scanning can commence.
The Problem Nobody Talks About (mostly because they're not interested, but if they are, they're not talking about it).
Validator operators don't run one chain per host like some kind of theoretical best-practice diagram.
They run multiple validators on the same infrastructure because:
- Hardware costs money
- Operations complexity scales with host count
- A 32-core server running one validator is wasteful
- Most chains don't max out resources simultaneously
So you get hosts running:
- Sui + Ethereum
- Solana + Cosmos
- Ethereum + Polygon + Arbitrum
- Some combination I've literally never seen before
And now your nice clean rule-based scanner that looks for "Sui signatures" doesn't know what to do.
I spent weeks trying to figure it out, ended up:
- Asking the user what services are running - lil bit 2002.
- Using AI to figure it out!
Using AI did a great job, sometimes. At a cost. I walked away for a bit, did something else.
The Overlapping Port Problem
It gets worse when chains use similar port ranges or standard services.
Multiple validators might expose:
- Metrics endpoints (Sui on 9184, standard Prometheus on 9090)
- JSON-RPC endpoints (Ethereum 8545, Solana 8899, Sui 8080)
- P2P networking (Ethereum 30303, Solana 8000-10000, Sui 8084)
- WebSocket connections (Solana 8900)
- Monitoring stacks (all using Grafana on 3000)
You can't just say "port 9090 = Prometheus therefore monitoring only."
Because what if that Prometheus instance is exposing metrics for three different validators?
Now you need to:
- Identify which metrics belong to which chain
- Understand which validators are actually running
- Map CVEs to the correct services
- Determine risk posture across multiple chains
Rule-based scanning doesn't scale to this.
The Configuration Variance Problem
Even if you nail down the ports, validator configurations vary wildly.
Some operators:
- Run validators in Docker (different process visibility)
- Use non-standard ports (8545 becomes 18545 because reasons)
- Proxy everything through nginx (now all you see is nginx)
- Run custom monitoring stacks (Prometheus? Grafana? Both? Neither?)
- Use systemd service names that don't match upstream defaults
So your rules for "detecting an Ethereum validator" need to account for:
- Standard Geth on 30303/8545
- Dockerized Geth on custom ports
- Proxied Geth behind nginx on 443
- Custom compiled Geth with a weird banner
- Besu instead of Geth (different client, same chain)
- Nethermind or Erigon (different again)
And that's one chain.
Multiply this across Sui, Solana, Cosmos, Polygon, Avalanche...
You see the problem.
Manual Verification Works (But Doesn't Scale)
The only reliable way to identify multi-chain hosts right now?
Human verification.
This works. It's accurate.
It's also slower than your nan.
If you're scanning hundreds of validator hosts across multiple clients, you can't manually verify every configuration.
And if the configuration changes (which it does), you have to verify again.
The Insight That Changes Everything
After manually verifying enough multi-chain hosts, I started to notice something:
They have a shape.
Not a shape you can easily encode in rules.
But a shape you can recognise.
A Sui+ETH host "feels different" than a Sui-only host.
A Solana+Cosmos host has a different "fingerprint" than either chain alone.
You can't write down the rules for why.
But you know it when you see it.
Your brain is doing something that rule-based scanners can't:
Pattern matching across multiple dimensions simultaneously.
You're not looking for specific ports.
You're looking at the whole configuration and recognizing similarity.
Port 8080 + 8545 + 30303 + 9090 + 3000?
That's a Sui + Ethereum setup with monitoring.
Port 8899 + 8900 + 8000-8020 + 9090?
That's Solana with standard monitoring.
Port 8080 + 9184 + 8545 + 8551 + 30303 + 8899 + 9090 + 3000?
That's a three-chain monster that needs close attention.
You're not consciously running through these rules.
You're just seeing the pattern.
Goodbye AI, or at least some of it.
Instead of writing rules, we could:
- Manually verify a multi-chain host once (or get the user to do this)!
- Store its "fingerprint" (ports, services, banners, everything)
- When we see a new host, search for similar fingerprints
- If it's similar to a verified host, inherit that classification
- If it's novel, verify manually and add to the training set
Ok, we're still using openai's embeddings but that's a lot cheaper than using openai's api.
This AI called this scaled pattern matching with human-verified training data.!! Weheey.
What Comes Next
In Part 2, I'll show you how vector embeddings let you do exactly this: turn a server's full configuration into a numerical fingerprint, then search for "servers that look like this one."
It's actually quite boring and if you know me, I love boring. Meanwhile, ChatGPT inserted this "it's just high-dimensional similarity search" which I found fun so I left it.
Spoiler: Postgres + pgvector is good enough! Most of us don't need the MEGA VECTOR DBS
This is Part 1 of 3 on building better security scanning for multi-chain validator infrastructure. Part 2 covers vector embeddings as scaled pattern matching.
Building something for good over here.
Top comments (0)