<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amr</title>
    <description>The latest articles on DEV Community by Amr (@amr-9).</description>
    <link>https://dev.to/amr-9</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3720563%2F98d4c85c-e3b5-482c-8bc9-198f32b5e1d1.png</url>
      <title>DEV Community: Amr</title>
      <link>https://dev.to/amr-9</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amr-9"/>
    <language>en</language>
    <item>
      <title>I built a "Bot Factory" in Go that routes thousands of Telegram bots through a single port</title>
      <dc:creator>Amr</dc:creator>
      <pubDate>Sat, 31 Jan 2026 19:16:44 +0000</pubDate>
      <link>https://dev.to/amr-9/i-built-a-bot-factory-in-go-that-routes-thousands-of-telegram-bots-through-a-single-port-1nnp</link>
      <guid>https://dev.to/amr-9/i-built-a-bot-factory-in-go-that-routes-thousands-of-telegram-bots-through-a-single-port-1nnp</guid>
      <description>&lt;p&gt;Hey everyone,&lt;/p&gt;

&lt;p&gt;I wanted to share BotForge, an open-source communication bot factory I built in Go. It allows you to host thousands of custom bots instantly without writing any code, making it possible to run a massive network of bots even on the weakest servers (like a low-spec VPS).&lt;/p&gt;

&lt;p&gt;The main engineering challenge was avoiding the resource overhead of running a separate process or polling loop for every single bot. To solve this, I built a unified HTTP server that handles webhooks for all bots simultaneously. It uses O(1) in-memory routing to direct updates to the correct bot instance and a custom "ManualPoller" implementation to keep the child bots passive, meaning they consume almost zero resources when idle.&lt;/p&gt;

&lt;p&gt;The stack is Telebot v3, Redis, and MySQL.&lt;br&gt;
I would appreciate any feedback on the architecture.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Amr-9/BotForge" rel="noopener noreferrer"&gt;https://github.com/Amr-9/BotForge&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>opensource</category>
      <category>telegram</category>
    </item>
    <item>
      <title>HexHunter: A GPU-Accelerated Vanity Address Generator for 6 Blockchains (Written in Go &amp; OpenCL)</title>
      <dc:creator>Amr</dc:creator>
      <pubDate>Tue, 20 Jan 2026 02:34:42 +0000</pubDate>
      <link>https://dev.to/amr-9/hexhunter-a-gpu-accelerated-vanity-address-generator-for-6-blockchains-written-in-go-opencl-2i76</link>
      <guid>https://dev.to/amr-9/hexhunter-a-gpu-accelerated-vanity-address-generator-for-6-blockchains-written-in-go-opencl-2i76</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When looking for a tool to generate vanity addresses (crypto addresses with custom prefixes like &lt;code&gt;0xdead...&lt;/code&gt;), I noticed a significant bottleneck in the ecosystem. While there are many open-source tools available, the vast majority run solely on the &lt;strong&gt;CPU&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Generating a vanity address is a brute-force operation. A CPU might check 500,000 addresses per second, which sounds fast, but finding a rare 8-character pattern could still take hours.&lt;/p&gt;

&lt;p&gt;The few existing GPU-accelerated tools had their own issues. The most famous one (Profanity) suffered from a critical vulnerability in its randomness generation (PRNG), which allowed attackers to reverse-engineer private keys.&lt;/p&gt;

&lt;p&gt;I wanted a solution that was &lt;strong&gt;fast (GPU-based)&lt;/strong&gt; but also &lt;strong&gt;cryptographically secure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The result is &lt;strong&gt;&lt;a href="https://github.com/Amr-9/HexHunter" rel="noopener noreferrer"&gt;HexHunter&lt;/a&gt;&lt;/strong&gt;: A cross-platform CLI tool capable of generating over &lt;strong&gt;40 million addresses per second&lt;/strong&gt; on consumer GPUs, written in Go for robustness and OpenCL for raw performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security: Solving the "Profanity" Flaw
&lt;/h2&gt;

&lt;p&gt;One of the biggest motivations for building HexHunter was to address the security flaw found in previous GPU vanity generators (like Profanity).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt;&lt;br&gt;
Previous tools often generated the random "seed" for the private key inside the GPU using a weak 32-bit number. This meant there were only ~4 billion possible starting points—a space small enough for hackers to brute-force and steal funds from generated wallets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The HexHunter Solution:&lt;/strong&gt;&lt;br&gt;
I shifted the responsibility of randomness entirely to the &lt;strong&gt;Host (Go)&lt;/strong&gt;, avoiding the GPU's limitations on entropy.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OS-Level Entropy:&lt;/strong&gt; HexHunter uses Go's &lt;code&gt;crypto/rand&lt;/code&gt; library to generate a full &lt;strong&gt;256-bit&lt;/strong&gt; cryptographically secure random number from the operating system's entropy source (&lt;code&gt;/dev/urandom&lt;/code&gt; on Linux/macOS, &lt;code&gt;CryptGenRandom&lt;/code&gt; on Windows).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic Scan:&lt;/strong&gt; This 256-bit secure key is sent to the GPU as a "Base Point". The GPU then simply increments from this secure starting point.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; The search space is the full 2^256 range of the elliptic curve, making it mathematically impossible to brute-force the seed, effectively patching the vulnerability that plagued the ecosystem.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Multi-Chain Support: The "Universal" Generator
&lt;/h2&gt;

&lt;p&gt;One of the core design goals of HexHunter was versatility. Instead of building separate tools for each ecosystem, I implemented support for &lt;strong&gt;6 major network families&lt;/strong&gt; within a single codebase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ethereum (EVM):&lt;/strong&gt; Supports Ethereum, BSC, Arbitrum, Optimism, Polygon, and Base.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bitcoin:&lt;/strong&gt; Supports Legacy (P2PKH), Nested SegWit (P2SH), and Taproot (P2TR).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solana:&lt;/strong&gt; High-speed Ed25519 generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tron:&lt;/strong&gt; Uses the same secp256k1 curve as Ethereum but with specific encoding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aptos:&lt;/strong&gt; Supports the newer Move-based chain address standards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sui:&lt;/strong&gt; Distinct address derivation logic for the Sui network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Supporting these required implementing different cryptographic primitives (secp256k1 and Ed25519) and hashing algorithms (Keccak-256, SHA-256, Blake2b) directly in the OpenCL kernels.&lt;/p&gt;
&lt;h2&gt;
  
  
  Technical Deep Dive: Bridging Go and OpenCL
&lt;/h2&gt;

&lt;p&gt;The application follows a "Host-Device" architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host (Go):&lt;/strong&gt; Manages user input, TUI (Terminal User Interface), file I/O, and secure random key generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Device (OpenCL C):&lt;/strong&gt; Executes the heavy cryptographic math on the GPU.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used &lt;strong&gt;CGO&lt;/strong&gt; to interface Go with the OpenCL C headers. This allows the application to be compiled into a single binary that manages GPU memory manually while leveraging Go's excellent concurrency model for the UI and control logic.&lt;/p&gt;
&lt;h2&gt;
  
  
  Optimization 1: In-Kernel Pattern Matching (Zero-Copy)
&lt;/h2&gt;

&lt;p&gt;The traditional approach for GPU processing involves generating data on the GPU and copying it back to the CPU RAM to check results. For vanity address generation, this is a fatal bottleneck. Transferring 40 million 20-byte addresses per second over the PCIe bus would choke the bandwidth instantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; I moved the pattern-matching logic &lt;em&gt;inside&lt;/em&gt; the GPU kernel. Each GPU thread generates an address and compares it against the user's target pattern (e.g., "starts with dead") immediately in VRAM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inside the OpenCL Kernel (vanity_v4.cl)&lt;/span&gt;
&lt;span class="n"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// Check prefix directly in GPU register memory&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;prefix_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;address_byte&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;target_prefix&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// CRITICAL: Only write to global memory if a match is found&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;atomic_xchg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;found_flag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// ... write result ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reduces memory writes by ~99.9%, allowing the GPU to run at 100% compute utilization without waiting for memory controllers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 2: Montgomery Batch Inversion
&lt;/h2&gt;

&lt;p&gt;Elliptic curve point addition (required for generating public keys from private keys) involves modular inversion, which is computationally expensive.&lt;/p&gt;

&lt;p&gt;To optimize this, HexHunter implements &lt;strong&gt;Montgomery Batch Inversion&lt;/strong&gt;. Instead of inverting one number at a time, the kernel groups hundreds of threads together. It multiplies their values, inverts the single product, and then distributes the inverse back to all threads. This dramatically reduces the number of expensive division operations required per address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;By combining these optimizations, HexHunter achieves significant performance on standard hardware:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RTX 4060:&lt;/strong&gt; ~45 Million addresses/sec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU Mode (Fallback):&lt;/strong&gt; ~600k addresses/sec (Optimized pure Go implementation)
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85tilmqpyc0puk0zyn73.gif" alt="HexHunter Demo" width="1180" height="987"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;HexHunter is an open-source attempt to bring professional-grade optimization and &lt;strong&gt;security&lt;/strong&gt; to vanity address generation across the entire crypto ecosystem. It demonstrates how Go can effectively act as a high-level orchestrator for low-level OpenCL compute kernels.&lt;/p&gt;

&lt;p&gt;The project is open source, and I welcome contributions to add more chains or improve the kernels further.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/Amr-9/HexHunter" rel="noopener noreferrer"&gt;github.com/Amr-9/HexHunter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>web3</category>
      <category>blockchain</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
