DEV Community

Masa
Masa

Posted on

I Found 34 Vulnerabilities in TON Blockchain's Consensus Algorithm — Claude Code Did 95% of the Work

I Found 34 Vulnerabilities in TON Blockchain's Consensus Algorithm — Claude Code Did 95% of the Work

Last week, TON Blockchain launched a consensus bug bounty challenge targeting their new Simplex BFT consensus implementation. I decided to go all-in with Claude Code (Opus 4.6) as my primary research tool.

The result: 34 vulnerabilities discovered, 6 dynamically confirmed, 3 ASAN-verified memory safety crashes — in under 3 days.

24 out of 25 commits were co-authored with Claude Code. This isn't a story about AI replacing security researchers. It's about what happens when you treat an AI coding agent as a genuine research partner.

The Setup

The target was validator/consensus/ in the TON blockchain — a C++ actor-based coroutine state machine implementing Simplex slot-based BFT consensus. Complex stuff: async coroutines, shared mutable state, 34 source files, and a custom actor framework.

My workflow was simple:

  1. Point Claude Code at the codebase
  2. Ask it to systematically analyze each component
  3. Have it write reproduction scripts and test harnesses
  4. Verify findings with ASAN/UBSAN builds

What Claude Code Actually Did

Deep Architectural Analysis

Claude Code didn't just grep for memcpy. It understood the entire consensus protocol flow — leader election, vote aggregation, certificate creation, block acceptance — and identified architectural flaws, not just surface bugs.

The biggest find: a systemic root cause connecting 7+ distinct bugs. The codebase uses .start().detach() fire-and-forget coroutines that modify shared state before co_await suspension points. When the awaited operation fails, there's no rollback. This pattern appears 17 times across the consensus code.

Attack Matrix Construction

Claude Code built a Byzantine fault injection test matrix:

Failure Mode Blocks/60s Throughput Loss
Baseline 60 0%
5% DB failure 8 87%
5% validation failure 58 3%
10% DB failure 0 100%

That last row is the killer — a 10% database failure rate causes total consensus liveness failure. Zero blocks produced. The network halts.

Memory Safety Bugs

Three ASAN-confirmed crashes with 100% reproduction rate:

  • Heap-use-after-free in coroutine finalization (actor holds reference across co_await)
  • Heap-buffer-overflow in Bus shutdown sequence
  • UBSAN signed integer overflow in slot arithmetic

Each with full stack traces, root cause analysis, and reproduction commands.

The Workflow That Worked

I didn't micromanage Claude Code. The key was giving it research-level autonomy:

"Analyze the coroutine error handling in consensus/.
For each co_await call site, determine what happens if
the awaited operation fails. Document any state that was
modified before the suspension point without rollback."
Enter fullscreen mode Exit fullscreen mode

Claude Code would then:

  1. Read all relevant source files
  2. Trace execution paths through the actor framework
  3. Identify vulnerable patterns
  4. Write detailed analysis with code references
  5. Generate reproduction scripts

One prompt → 2-3 hours of equivalent manual analysis, done in minutes.

What Surprised Me

Claude Code found bugs I wouldn't have. The coroutine rollback pattern is the kind of thing that requires holding the entire state machine in your head while tracing async execution paths. Claude Code did this systematically across 34 files without fatigue or oversight.

The economic impact analysis was unexpected. Without prompting, Claude Code traced the consensus liveness failure through to DeFi protocols on TON — calculating potential cascade losses of $4-10M from oracle staleness and liquidation failures.

It wrote better bug reports than I would. Each vulnerability got a structured analysis: root cause, trigger condition, reproduction steps, impact assessment, and suggested fix. 15 submission-ready Telegram reports.

By the Numbers

  • 34 vulnerabilities identified across 5 severity tiers
  • 6 dynamically confirmed via ASAN + attack matrix
  • 3 memory safety crashes (ASAN-confirmed)
  • 24/25 commits co-authored with Claude Code
  • 200K words of technical analysis generated
  • ~3 days total elapsed time
  • 27 bugs vs 9 from a competing researcher (18 exclusive findings)

The Takeaway

Security research is one of the highest-leverage applications of AI coding agents. The combination of:

  • Systematic pattern matching across large codebases
  • Tireless tracing of async execution paths
  • Instant context switching between protocol layers
  • Structured documentation of findings

...makes Claude Code genuinely better at certain classes of vulnerability discovery than manual analysis.

If you're doing security research and not using an AI coding agent, you're leaving bugs on the table.


Tools used: Claude Code (Opus 4.6), VOICEVOX (zundamon voice for notifications — highly recommended for long sessions), ASAN/UBSAN, test-consensus harness, Python tontester

Top comments (0)