Masa

Posted on Mar 20

I Found 34 Vulnerabilities in TON Blockchain's Consensus Algorithm — Claude Code Did 95% of the Work

#claudecode #blockchain #security

I Found 34 Vulnerabilities in TON Blockchain's Consensus Algorithm — Claude Code Did 95% of the Work

Last week, TON Blockchain launched a consensus bug bounty challenge targeting their new Simplex BFT consensus implementation. I decided to go all-in with Claude Code (Opus 4.6) as my primary research tool.

The result: 34 vulnerabilities discovered, 6 dynamically confirmed, 3 ASAN-verified memory safety crashes — in under 3 days.

24 out of 25 commits were co-authored with Claude Code. This isn't a story about AI replacing security researchers. It's about what happens when you treat an AI coding agent as a genuine research partner.

The Setup

The target was validator/consensus/ in the TON blockchain — a C++ actor-based coroutine state machine implementing Simplex slot-based BFT consensus. Complex stuff: async coroutines, shared mutable state, 34 source files, and a custom actor framework.

My workflow was simple:

Point Claude Code at the codebase
Ask it to systematically analyze each component
Have it write reproduction scripts and test harnesses
Verify findings with ASAN/UBSAN builds

What Claude Code Actually Did

Deep Architectural Analysis

Claude Code didn't just grep for memcpy. It understood the entire consensus protocol flow — leader election, vote aggregation, certificate creation, block acceptance — and identified architectural flaws, not just surface bugs.

The biggest find: a systemic root cause connecting 7+ distinct bugs. The codebase uses .start().detach() fire-and-forget coroutines that modify shared state before co_await suspension points. When the awaited operation fails, there's no rollback. This pattern appears 17 times across the consensus code.

Attack Matrix Construction

Claude Code built a Byzantine fault injection test matrix:

Failure Mode	Blocks/60s	Throughput Loss
Baseline	60	0%
5% DB failure	8	87%
5% validation failure	58	3%
10% DB failure	0	100%

That last row is the killer — a 10% database failure rate causes total consensus liveness failure. Zero blocks produced. The network halts.

Memory Safety Bugs

Three ASAN-confirmed crashes with 100% reproduction rate:

Heap-use-after-free in coroutine finalization (actor holds reference across co_await)
Heap-buffer-overflow in Bus shutdown sequence
UBSAN signed integer overflow in slot arithmetic

Each with full stack traces, root cause analysis, and reproduction commands.

The Workflow That Worked

I didn't micromanage Claude Code. The key was giving it research-level autonomy:

"Analyze the coroutine error handling in consensus/.
For each co_await call site, determine what happens if
the awaited operation fails. Document any state that was
modified before the suspension point without rollback."

Claude Code would then:

Read all relevant source files
Trace execution paths through the actor framework
Identify vulnerable patterns
Write detailed analysis with code references
Generate reproduction scripts

One prompt → 2-3 hours of equivalent manual analysis, done in minutes.

What Surprised Me

Claude Code found bugs I wouldn't have. The coroutine rollback pattern is the kind of thing that requires holding the entire state machine in your head while tracing async execution paths. Claude Code did this systematically across 34 files without fatigue or oversight.

The economic impact analysis was unexpected. Without prompting, Claude Code traced the consensus liveness failure through to DeFi protocols on TON — calculating potential cascade losses of $4-10M from oracle staleness and liquidation failures.

It wrote better bug reports than I would. Each vulnerability got a structured analysis: root cause, trigger condition, reproduction steps, impact assessment, and suggested fix. 15 submission-ready Telegram reports.

By the Numbers

34 vulnerabilities identified across 5 severity tiers
6 dynamically confirmed via ASAN + attack matrix
3 memory safety crashes (ASAN-confirmed)
24/25 commits co-authored with Claude Code
200K words of technical analysis generated
~3 days total elapsed time
27 bugs vs 9 from a competing researcher (18 exclusive findings)

The Takeaway

Security research is one of the highest-leverage applications of AI coding agents. The combination of:

Systematic pattern matching across large codebases
Tireless tracing of async execution paths
Instant context switching between protocol layers
Structured documentation of findings

...makes Claude Code genuinely better at certain classes of vulnerability discovery than manual analysis.

If you're doing security research and not using an AI coding agent, you're leaving bugs on the table.

Tools used: Claude Code (Opus 4.6), VOICEVOX (zundamon voice for notifications — highly recommended for long sessions), ASAN/UBSAN, test-consensus harness, Python tontester

DEV Community

I Found 34 Vulnerabilities in TON Blockchain's Consensus Algorithm — Claude Code Did 95% of the Work

I Found 34 Vulnerabilities in TON Blockchain's Consensus Algorithm — Claude Code Did 95% of the Work

The Setup

What Claude Code Actually Did

Deep Architectural Analysis

Attack Matrix Construction

Memory Safety Bugs

The Workflow That Worked

What Surprised Me

By the Numbers

The Takeaway

Top comments (0)