Introduction
When looking for a tool to generate vanity addresses (crypto addresses with custom prefixes like 0xdead...), I noticed a significant bottleneck in the ecosystem. While there are many open-source tools available, the vast majority run solely on the CPU.
Generating a vanity address is a brute-force operation. A CPU might check 500,000 addresses per second, which sounds fast, but finding a rare 8-character pattern could still take hours.
The few existing GPU-accelerated tools had their own issues. The most famous one (Profanity) suffered from a critical vulnerability in its randomness generation (PRNG), which allowed attackers to reverse-engineer private keys.
I wanted a solution that was fast (GPU-based) but also cryptographically secure.
The result is HexHunter: A cross-platform CLI tool capable of generating over 40 million addresses per second on consumer GPUs, written in Go for robustness and OpenCL for raw performance.
Security: Solving the "Profanity" Flaw
One of the biggest motivations for building HexHunter was to address the security flaw found in previous GPU vanity generators (like Profanity).
The Problem:
Previous tools often generated the random "seed" for the private key inside the GPU using a weak 32-bit number. This meant there were only ~4 billion possible starting points—a space small enough for hackers to brute-force and steal funds from generated wallets.
The HexHunter Solution:
I shifted the responsibility of randomness entirely to the Host (Go), avoiding the GPU's limitations on entropy.
-
OS-Level Entropy: HexHunter uses Go's
crypto/randlibrary to generate a full 256-bit cryptographically secure random number from the operating system's entropy source (/dev/urandomon Linux/macOS,CryptGenRandomon Windows). - Deterministic Scan: This 256-bit secure key is sent to the GPU as a "Base Point". The GPU then simply increments from this secure starting point.
- Result: The search space is the full 2^256 range of the elliptic curve, making it mathematically impossible to brute-force the seed, effectively patching the vulnerability that plagued the ecosystem.
Multi-Chain Support: The "Universal" Generator
One of the core design goals of HexHunter was versatility. Instead of building separate tools for each ecosystem, I implemented support for 6 major network families within a single codebase:
- Ethereum (EVM): Supports Ethereum, BSC, Arbitrum, Optimism, Polygon, and Base.
- Bitcoin: Supports Legacy (P2PKH), Nested SegWit (P2SH), and Taproot (P2TR).
- Solana: High-speed Ed25519 generation.
- Tron: Uses the same secp256k1 curve as Ethereum but with specific encoding.
- Aptos: Supports the newer Move-based chain address standards.
- Sui: Distinct address derivation logic for the Sui network.
Supporting these required implementing different cryptographic primitives (secp256k1 and Ed25519) and hashing algorithms (Keccak-256, SHA-256, Blake2b) directly in the OpenCL kernels.
Technical Deep Dive: Bridging Go and OpenCL
The application follows a "Host-Device" architecture:
- Host (Go): Manages user input, TUI (Terminal User Interface), file I/O, and secure random key generation.
- Device (OpenCL C): Executes the heavy cryptographic math on the GPU.
I used CGO to interface Go with the OpenCL C headers. This allows the application to be compiled into a single binary that manages GPU memory manually while leveraging Go's excellent concurrency model for the UI and control logic.
Optimization 1: In-Kernel Pattern Matching (Zero-Copy)
The traditional approach for GPU processing involves generating data on the GPU and copying it back to the CPU RAM to check results. For vanity address generation, this is a fatal bottleneck. Transferring 40 million 20-byte addresses per second over the PCIe bus would choke the bandwidth instantly.
Solution: I moved the pattern-matching logic inside the GPU kernel. Each GPU thread generates an address and compares it against the user's target pattern (e.g., "starts with dead") immediately in VRAM.
// Inside the OpenCL Kernel (vanity_v4.cl)
bool match = true;
// Check prefix directly in GPU register memory
for (uint i = 0; i < prefix_len; i++) {
if (address_byte[i] != target_prefix[i]) {
match = false;
break;
}
}
// CRITICAL: Only write to global memory if a match is found
if (match) {
atomic_xchg(found_flag, 1);
// ... write result ...
}
This reduces memory writes by ~99.9%, allowing the GPU to run at 100% compute utilization without waiting for memory controllers.
Optimization 2: Montgomery Batch Inversion
Elliptic curve point addition (required for generating public keys from private keys) involves modular inversion, which is computationally expensive.
To optimize this, HexHunter implements Montgomery Batch Inversion. Instead of inverting one number at a time, the kernel groups hundreds of threads together. It multiplies their values, inverts the single product, and then distributes the inverse back to all threads. This dramatically reduces the number of expensive division operations required per address.
Performance
By combining these optimizations, HexHunter achieves significant performance on standard hardware:
- RTX 4060: ~45 Million addresses/sec
-
CPU Mode (Fallback): ~600k addresses/sec (Optimized pure Go implementation)
Conclusion
HexHunter is an open-source attempt to bring professional-grade optimization and security to vanity address generation across the entire crypto ecosystem. It demonstrates how Go can effectively act as a high-level orchestrator for low-level OpenCL compute kernels.
The project is open source, and I welcome contributions to add more chains or improve the kernels further.
🔗 GitHub Repository: github.com/Amr-9/HexHunter
Top comments (0)