<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AnimeForLife191</title>
    <description>The latest articles on DEV Community by AnimeForLife191 (@animeforlife191).</description>
    <link>https://dev.to/animeforlife191</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843716%2F3858257d-3642-46e3-b3c3-32634ae07240.jpeg</url>
      <title>DEV Community: AnimeForLife191</title>
      <link>https://dev.to/animeforlife191</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/animeforlife191"/>
    <language>en</language>
    <item>
      <title>Building a Malware Scanner in Rust That Scans 1.4 Million Files per Minute</title>
      <dc:creator>AnimeForLife191</dc:creator>
      <pubDate>Wed, 25 Mar 2026 21:58:12 +0000</pubDate>
      <link>https://dev.to/animeforlife191/building-a-malware-scanner-in-rust-that-scans-14-million-files-per-minute-1ojb</link>
      <guid>https://dev.to/animeforlife191/building-a-malware-scanner-in-rust-that-scans-14-million-files-per-minute-1ojb</guid>
      <description>&lt;p&gt;For the past few months, I've been working on building my own cybersecurity tools in Rust. I wanted to understand and learn how these different tools work behind the scene and hopefully help somebody that wants to do the same.&lt;/p&gt;

&lt;p&gt;This is my Malware Scanner named &lt;a href="https://github.com/AnimeForLife191/Shuhari-CyberForge/blob/main/tools/takeri/src/TakeriREADME.md" rel="noopener noreferrer"&gt;Takeri&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm Building This
&lt;/h2&gt;

&lt;p&gt;I'm a 21 year old cybersecurity student about to graduate with my Associates and I had found an interest in programming. Mainly with Rust. I wanted to find a way to combine both fields, not just to build something meaningful, but hopefully turn it into a career and help others understand how these tools actually work.&lt;/p&gt;

&lt;p&gt;The problem was… I realized I didn’t fully understand how a lot of these cybersecurity tools worked behind the scenes either.&lt;/p&gt;

&lt;p&gt;So instead of just using them, I decided to start building my own.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Takeri Is
&lt;/h2&gt;

&lt;p&gt;Takeri is a malware scanner focused on detecting known threats using signature-based detection. It works by comparing file hashes (MD5 and SHA256) against a large database of known malicious signatures.&lt;/p&gt;

&lt;p&gt;By leveraging ClamAV signature databases (.hdb and .hsb), Takeri has access to over 500,000 known signatures.&lt;/p&gt;

&lt;p&gt;In addition to hash matching, it also performs magic byte analysis to verify that files are actually what they claim to be, helping detect suspicious files that may be disguised with misleading extensions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Takeri starts by downloading the main.cvd and daily.cvd databases from ClamAV. From these, it extracts the .hdb and .hsb signature files and loads them into memory.&lt;/p&gt;

&lt;p&gt;Each signature includes:&lt;/p&gt;

&lt;p&gt;The hash (MD5 or SHA256)&lt;br&gt;
The expected file size&lt;br&gt;
The malware name&lt;/p&gt;

&lt;p&gt;These are stored using HashMap and HashSet for fast lookups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;SignatureInfo&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SignatureSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;SignatureDb&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;md5_signatures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;SignatureInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;sha256_signatures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;SignatureInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;md5_sizes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashSet&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u64&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;sha256_sizes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashSet&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u64&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;all_sizes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashSet&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u64&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;SignatureSize&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Specific&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HashSet&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u64&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;Wildcard&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scanning Process
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;File Size Filtering&lt;/strong&gt;&lt;br&gt;
Before doing any expensive work, it checks if the file size matches any known signature sizes.&lt;/p&gt;

&lt;p&gt;If it doesn’t match, the file is skipped entirely.&lt;/p&gt;

&lt;p&gt;This avoids unnecessary hashing and is one of the biggest performance optimizations in the scanner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Magic Byte Analysis&lt;/strong&gt;&lt;br&gt;
Next, Takeri reads the file’s magic bytes and compares them to its extension.&lt;/p&gt;

&lt;p&gt;If the file claims to be one type but the actual format doesn’t match, it gets flagged as suspicious.&lt;/p&gt;

&lt;p&gt;(Side note: while writing this, I realized suspicious files currently skip signature matching entirely which is something I plan to fix.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hashing and Signature Matching&lt;/strong&gt;&lt;br&gt;
Finally, if the file passes the earlier checks, Takeri hashes it (MD5 and/or SHA256 depending on the case) and compares it against the loaded signature database.&lt;/p&gt;

&lt;p&gt;If a match is found, the file is flagged as infected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel Processing&lt;/strong&gt;&lt;br&gt;
To improve performance, Takeri uses Rayon to scan files in parallel across multiple threads.&lt;/p&gt;

&lt;p&gt;This allows the scanner to fully utilize available CPU cores and significantly increases throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Now for the interesting part...PERFORMANCE.&lt;br&gt;
&lt;strong&gt;Linux&lt;/strong&gt;&lt;br&gt;
On a machine running Arch Linux with an Intel i9-13900KF and an NVMe SSD, Takeri was able to scan the root directory and process 1.4 million files in about 45 seconds on a cold run.&lt;/p&gt;

&lt;p&gt;When files are cached by the system, that time drops even further, often cutting the scan time in half or more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows&lt;/strong&gt;&lt;br&gt;
On a machine running Windows 11 and a two core cpu at 1.1Mhz with an unknown storage device only holding 60GB. It can scan C:/ at 50,000 files.........every 10 minutes. Lightning speeds&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why the Difference?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the moment, I don’t have access to higher-end Windows hardware to fully compare results, but there are a few likely factors:&lt;/p&gt;

&lt;p&gt;Hardware limitations (most significant)&lt;br&gt;
Disk speed differences&lt;br&gt;
OS-level file system performance&lt;br&gt;
Thread scheduling differences between Linux and Windows&lt;/p&gt;

&lt;p&gt;Testing in a VM produced similar or worse results, which further points to hardware and I/O constraints as the primary bottleneck. &lt;/p&gt;

&lt;p&gt;Takeri performs extremely well on modern hardware, but like most scanners, its performance is heavily dependent on disk speed and system resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Whats Next
&lt;/h2&gt;

&lt;p&gt;Takeri is still very early, and there’s a lot I want to improve and build on.&lt;/p&gt;

&lt;p&gt;Right now, the focus is on making the scanner smarter and more efficient. While signature-based detection works well for known threats, it has limitations, so I want to start expanding beyond that.&lt;/p&gt;

&lt;p&gt;Some of the next steps include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Smarter file selection&lt;br&gt;
Improving how files are chosen for scanning to reduce unnecessary work and improve performance on lower-end systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Heuristic scanning&lt;br&gt;
Adding basic behavior and pattern-based detection to catch suspicious files that don’t match known signatures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;YARA rule support&lt;br&gt;
Integrating custom rule-based detection to allow more flexible and advanced scanning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Archive scanning&lt;br&gt;
Being able to scan inside compressed files like .zip and .tar, which is where malware often hides.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better scan modes&lt;br&gt;
Introducing options like quick scans and more configurable behavior.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved output and reporting&lt;br&gt;
Making results easier to read, export, and actually useful for users.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Beyond Takeri
&lt;/h3&gt;

&lt;p&gt;Takeri is just one part of a larger project I’m working on called &lt;a href="https://github.com/AnimeForLife191/Shuhari-CyberForge/blob/main/README.md" rel="noopener noreferrer"&gt;Shuhari CyberForge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The goal is to build a small suite of cybersecurity tools that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open source&lt;/li&gt;
&lt;li&gt;Transparent&lt;/li&gt;
&lt;li&gt;Educational&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now, that includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shugo - a Windows security auditor&lt;/li&gt;
&lt;li&gt;Takeri - the malware scanner&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And eventually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Network security tool&lt;/li&gt;
&lt;li&gt;A password manager&lt;/li&gt;
&lt;li&gt;And more (still figuring that part out)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m still learning a lot as I build this, and that’s honestly the main goal.&lt;/p&gt;

&lt;p&gt;Even if this never becomes a full antivirus or widely used tool, it’s already been a huge learning experience and if other people can learn something from it too, that’s a win. If your interested in this at all, please go star the &lt;a href="https://github.com/AnimeForLife191/Shuhari-CyberForge" rel="noopener noreferrer"&gt;repo&lt;/a&gt;, it would mean a lot.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cybersecurity</category>
      <category>programming</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
