<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Khalid Hussein</title>
    <description>The latest articles on DEV Community by Khalid Hussein (@kherld).</description>
    <link>https://dev.to/kherld</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1474997%2Ffac375a8-6844-455c-822e-5b4cffe7a65e.jpg</url>
      <title>DEV Community: Khalid Hussein</title>
      <link>https://dev.to/kherld</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kherld"/>
    <language>en</language>
    <item>
      <title>Introducing kreuzcrawl v0.3.0</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Thu, 25 Jun 2026 09:52:08 +0000</pubDate>
      <link>https://dev.to/kreuzberg/introducing-kreuzcrawl-v030-8di</link>
      <guid>https://dev.to/kreuzberg/introducing-kreuzcrawl-v030-8di</guid>
      <description>&lt;p&gt;kreuzcrawl began as a Rust core with bindings for ten languages. v0.3.0 ships fourteen, adds a tiered WAF-aware dispatch engine, cuts peak streaming memory from ~2.5 GB to ~20 MB, and enables SSRF defense across every outbound call path by default. It is the first release we consider API-stable.&lt;/p&gt;

&lt;p&gt;This post covers what changed, why each decision was made, and what the harder engineering problems looked like from the inside.&lt;/p&gt;

&lt;h2&gt;
  
  
  At a glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;v0.2.0&lt;/th&gt;
&lt;th&gt;v0.3.0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language bindings&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;14 (+Dart, Kotlin/Android, Swift, Zig)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peak streaming memory&lt;/td&gt;
&lt;td&gt;~2.5 GB&lt;/td&gt;
&lt;td&gt;~20 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSRF protection&lt;/td&gt;
&lt;td&gt;opt-in&lt;/td&gt;
&lt;td&gt;on by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dispatch model&lt;/td&gt;
&lt;td&gt;static HTTP / bypass / browser&lt;/td&gt;
&lt;td&gt;tiered, signal-driven escalation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WAF fingerprints&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;35 across 8 vendors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fingerprint hot-reload&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;lock-free (&lt;code&gt;ArcSwap&lt;/code&gt;), 500 ms debounce&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP tools&lt;/td&gt;
&lt;td&gt;partial&lt;/td&gt;
&lt;td&gt;1:1 with CLI, safety-annotated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI subcommands&lt;/td&gt;
&lt;td&gt;scrape, crawl&lt;/td&gt;
&lt;td&gt;+ batch-scrape, batch-crawl, download, citations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Robots / sitemap parsers&lt;/td&gt;
&lt;td&gt;engine-internal&lt;/td&gt;
&lt;td&gt;public modules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API stability&lt;/td&gt;
&lt;td&gt;preview&lt;/td&gt;
&lt;td&gt;stable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Four new language bindings
&lt;/h2&gt;

&lt;p&gt;v0.2.0 shipped Rust, Python, Node.js, Ruby, Go, Java, C#, PHP, Elixir, and WebAssembly.&lt;br&gt;
v0.3.0 adds &lt;strong&gt;Dart&lt;/strong&gt;, &lt;strong&gt;Kotlin/Android&lt;/strong&gt;, &lt;strong&gt;Swift&lt;/strong&gt;, and &lt;strong&gt;Zig&lt;/strong&gt; — bringing the total to fourteen.&lt;/p&gt;

&lt;p&gt;None of the per-language glue is written by hand. Every binding is generated from the Rust core by &lt;a href="https://github.com/xberg-io/alef" rel="noopener noreferrer"&gt;alef&lt;/a&gt;, our polyglot binding generator.&lt;br&gt;
The Dart and Kotlin/Android packages bind through the C FFI layer (&lt;code&gt;kreuzcrawl-ffi&lt;/code&gt;) via &lt;code&gt;dart:ffi&lt;/code&gt; and JNI respectively. Swift binds through clang. Zig uses &lt;code&gt;@cImport&lt;/code&gt; against the same C header.&lt;/p&gt;

&lt;p&gt;The generation pipeline also hardened in this release: the Docker publish matrix now builds each architecture natively rather than via QEMU emulation, the Dart build no longer requires the Flutter SDK for pub.dev publishes, Swift artifactbundle checksums are injected automatically, and the Elixir/PHP/Ruby releases preserve their lock files through the source-publish step.&lt;/p&gt;

&lt;p&gt;=== "Python"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
pip install kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Node.js"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
npm install @xberg/kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Rust"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
cargo add kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Go"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
go get github.com/xberg-io/kreuzcrawl/packages/go
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Java"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```xml
&amp;lt;dependency&amp;gt;
  &amp;lt;groupId&amp;gt;io.xberg.kreuzcrawl&amp;lt;/groupId&amp;gt;
  &amp;lt;artifactId&amp;gt;kreuzcrawl&amp;lt;/artifactId&amp;gt;
  &amp;lt;version&amp;gt;0.3.0&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Kotlin (Android)"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```groovy
implementation("io.xberg.kreuzcrawl.android:kreuzcrawl-android:0.3.0")
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "C#"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
dotnet add package Kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Ruby"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
gem install kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "PHP"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
composer require xberg-io/kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Elixir"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```elixir
{:kreuzcrawl, "~&amp;gt; 0.3"}
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Dart"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
dart pub add kreuzcrawl
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Swift"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```swift
// Package.swift
.package(url: "https://github.com/xberg-io/kreuzcrawl", from: "0.3.0")
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "Zig"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
zig fetch --save https://github.com/xberg-io/kreuzcrawl/archive/v0.3.0.tar.gz
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;=== "WebAssembly"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```sh
npm install @xberg/kreuzcrawl-wasm
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Memory-bounded streaming
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;crawl_stream()&lt;/code&gt; and &lt;code&gt;batch_crawl_stream()&lt;/code&gt; previously accumulated every page result in memory before the caller received any of them. On a large crawl — tens of thousands of pages, each carrying extracted text, metadata, links, and images — the peak working set reached approximately 2.5 GB.&lt;/p&gt;

&lt;p&gt;The fix is a change in ownership: each page result is moved into &lt;code&gt;CrawlEvent::Page&lt;/code&gt; and emitted immediately. The caller receives it, processes it, and drops it. The engine never holds more than the current in-flight pages, bounded by the concurrency setting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The event type (unchanged externally; behavior changed internally)&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;CrawlEvent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Page&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;CrawlPageResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// (1)&lt;/span&gt;
    &lt;span class="n"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;Complete&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;pages_crawled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;CrawlPageResult&lt;/code&gt; is boxed, moved into the variant, and dropped when the caller's loop moves past it. The engine holds no reference after the send.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Python — pages are processed and released one at a time
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kreuzcrawl&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;crawl_stream&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;crawl_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# event is dropped after this scope
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Peak working set on a 10,000-page crawl with default concurrency (16): &lt;strong&gt;~20 MB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The non-streaming &lt;code&gt;crawl()&lt;/code&gt; is unchanged — it accumulates by contract, because callers need the complete &lt;code&gt;CrawlResult&lt;/code&gt;. The two code paths are kept separate. Merging them would push the accumulation pattern onto callers, which is the same problem moved one level up.&lt;/p&gt;

&lt;p&gt;!!! tip "Choosing between &lt;code&gt;crawl()&lt;/code&gt; and &lt;code&gt;crawl_stream()&lt;/code&gt;"&lt;br&gt;
    Use &lt;code&gt;crawl()&lt;/code&gt; when you need the full result set in memory. Use &lt;code&gt;crawl_stream()&lt;/code&gt; for&lt;br&gt;
    large crawls, progress tracking, or when you process results one at a time. The memory&lt;br&gt;
    difference is significant at scale.&lt;/p&gt;
&lt;h2&gt;
  
  
  SSRF defense on by default
&lt;/h2&gt;

&lt;p&gt;Web crawlers take URLs as input and make HTTP requests — the exact primitive an attacker needs to reach internal services. Every path that accepts a URL now validates it against an &lt;code&gt;SsrfPolicy&lt;/code&gt; before making the request: &lt;code&gt;scrape()&lt;/code&gt;, &lt;code&gt;crawl()&lt;/code&gt;, &lt;code&gt;batch_crawl()&lt;/code&gt;, sitemap fetches, robots.txt fetches, asset downloads, and link enqueue.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is refused
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Ranges&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Loopback&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;127.0.0.0/8&lt;/code&gt;, &lt;code&gt;::1/128&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private (RFC 1918)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;10.0.0.0/8&lt;/code&gt;, &lt;code&gt;172.16.0.0/12&lt;/code&gt;, &lt;code&gt;192.168.0.0/16&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Link-local / cloud metadata&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;169.254.0.0/16&lt;/code&gt; (incl. &lt;code&gt;169.254.169.254&lt;/code&gt;), &lt;code&gt;fe80::/10&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unspecified&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/8&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multicast&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;224.0.0.0/4&lt;/code&gt;, &lt;code&gt;ff00::/8&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IPv6 unique-local&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fc00::/7&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-http(s) schemes&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;file://&lt;/code&gt;, &lt;code&gt;ftp://&lt;/code&gt;, &lt;code&gt;gopher://&lt;/code&gt;, …&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  DNS rebinding mitigation
&lt;/h3&gt;

&lt;p&gt;Checking the hostname at validation time is insufficient. An attacker can register&lt;br&gt;
&lt;code&gt;evil.example.com&lt;/code&gt;, serve a public IP at validation, then update DNS to point to&lt;br&gt;
&lt;code&gt;192.168.1.1&lt;/code&gt; once the check passes.&lt;/p&gt;

&lt;p&gt;The policy resolves every hostname via DNS and validates &lt;strong&gt;all returned IP addresses&lt;/strong&gt;. If any resolved IP is in the deny list, the request is refused — regardless of what the others resolve to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// From kreuzcrawl/src/net/ssrf.rs&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IpAddr&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;lookup_host&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;lookup_addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
    &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="nf"&gt;.ip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;addresses&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;is_ip_permitted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;SsrfError&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;DeniedByPolicy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;classify_private_ip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Redirect-chain re-validation
&lt;/h3&gt;

&lt;p&gt;Each &lt;code&gt;30x&lt;/code&gt; &lt;code&gt;Location&lt;/code&gt; header is re-resolved and re-validated before the next hop is taken. This closes the redirect-chain attack: a public URL that redirects to &lt;code&gt;http://169.254.169.254/latest/meta-data/&lt;/code&gt; is refused at the second hop. Redirect following is bounded by &lt;code&gt;SsrfPolicy::max_redirects&lt;/code&gt; (default: 5).&lt;/p&gt;

&lt;h3&gt;
  
  
  Opting out
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Environment variable — applies to every crawler in the process&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;KREUZCRAWL_ALLOW_PRIVATE_NETWORK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Per-config builder — applies to a single CrawlConfig&lt;/span&gt;
&lt;span class="nn"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.allow_private_networks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.ssrf_allowlist_host&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;HostMatcher&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Cidr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"10.0.0.0/8"&lt;/span&gt;&lt;span class="nf"&gt;.into&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;!!! warning "Wasm targets"&lt;br&gt;
    On &lt;code&gt;wasm32&lt;/code&gt;, SSRF checking is disabled — the browser's fetch API and same-origin&lt;br&gt;
    policy are the enforcing boundary, and &lt;code&gt;tokio::net::lookup_host&lt;/code&gt; is unavailable in&lt;br&gt;
    that context.&lt;/p&gt;
&lt;h2&gt;
  
  
  WAF-aware tiered dispatch
&lt;/h2&gt;

&lt;p&gt;Before v0.3.0, the dispatch decision was static: HTTP, or bypass-vendor, or browser — chosen at config time and fixed for the duration of the crawl. This had an obvious cost problem: routing every request through a bypass provider because 5% of pages are blocked is expensive.&lt;/p&gt;

&lt;p&gt;The new engine chains tiers and escalates based on per-attempt signals.&lt;/p&gt;
&lt;h3&gt;
  
  
  Tiers and escalation strategies
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;Tier&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Http&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// plain HTTP fetch&lt;/span&gt;
    &lt;span class="n"&gt;Bypass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// vendor-managed bypass (Zyte, ScrapingBee, Bright Data, …)&lt;/span&gt;
    &lt;span class="n"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// headless Chrome via Chromiumoxide&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;EscalationStrategy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;// HTTP only; surface all failures&lt;/span&gt;
    &lt;span class="n"&gt;BrowserOnly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// HTTP → Browser on block  ← default&lt;/span&gt;
    &lt;span class="n"&gt;BypassFirst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// always use bypass (legacy behaviour)&lt;/span&gt;
    &lt;span class="n"&gt;BypassOnly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// HTTP → Bypass on block; no browser&lt;/span&gt;
    &lt;span class="n"&gt;BypassThenBrowser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// HTTP → Bypass → Browser; maximum resilience&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;All dispatch enums are &lt;code&gt;#[non_exhaustive]&lt;/code&gt; — new variants can be added without breaking downstream &lt;code&gt;match&lt;/code&gt; arms.&lt;/p&gt;
&lt;h3&gt;
  
  
  WAF detection: Aho-Corasick over a TOML corpus
&lt;/h3&gt;

&lt;p&gt;Detecting a WAF challenge page requires inspecting both response headers and body.&lt;br&gt;
A naïve approach — one regex per fingerprint per response — scales as O(fingerprints × body_length). With 35 fingerprints that's expensive per page.&lt;/p&gt;

&lt;p&gt;All body-pattern signals across all fingerprints are compiled into a &lt;strong&gt;single Aho-Corasick automaton&lt;/strong&gt; at startup. One scan of the response body returns the set of matching pattern indices; each maps to a fingerprint via a flat &lt;code&gt;Vec&amp;lt;usize&amp;gt;&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;Rules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fingerprints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Fingerprint&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;automaton&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AhoCorasick&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;// single automaton over all patterns&lt;/span&gt;
    &lt;span class="n"&gt;pattern_to_fp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// AC pattern index → fingerprint index&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The body scan is capped at &lt;strong&gt;100 KB&lt;/strong&gt; (&lt;code&gt;CHALLENGE_BODY_LIMIT&lt;/code&gt;). WAF challenge pages are small; real content pages overwhelmingly exceed this threshold. This bounds scan cost without missing signals.&lt;/p&gt;

&lt;p&gt;Header signals are checked first (constant time per fingerprint). If a fingerprint fires on headers alone, the body scan is skipped entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current corpus:&lt;/strong&gt; 35 fingerprints across Cloudflare (10), DataDome (6), PerimeterX (5), Imperva (5), AWS WAF (4), F5 (2), Akamai (1), and generic corroborating patterns (2).&lt;/p&gt;

&lt;h3&gt;
  
  
  Hot-reload for live environments
&lt;/h3&gt;

&lt;p&gt;The fingerprint corpus is a TOML file (&lt;code&gt;rules/waf_fingerprints.toml&lt;/code&gt;). In Kubernetes deployments, it is managed as a ConfigMap — operators update signatures without restarting the process.&lt;/p&gt;

&lt;p&gt;The compiled &lt;code&gt;Rules&lt;/code&gt; is wrapped in &lt;code&gt;arc_swap::ArcSwap&lt;/code&gt;. &lt;code&gt;TomlClassifier::watch()&lt;/code&gt;&lt;br&gt;
starts a filesystem watcher that atomically swaps the rule set when the file changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;TomlClassifier&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ArcSwap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Rules&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;TomlClassifier&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;AsRef&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;WatchHandle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WatchError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nn"&gt;watch&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;start_watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="nf"&gt;.as_ref&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Events are debounced 500 ms — this handles both editors that write via tmpfile+rename and the Kubernetes ConfigMap atomic projection mechanism, which produces the same sequence of filesystem events.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-domain EWMA state
&lt;/h3&gt;

&lt;p&gt;The engine tracks a block rate per domain using an Exponentially Weighted Moving Average. High block rates promote the starting tier: a domain that has been blocking consistently starts at &lt;code&gt;Bypass&lt;/code&gt; or &lt;code&gt;Browser&lt;/code&gt; rather than always attempting &lt;code&gt;Http&lt;/code&gt; first.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;DomainStatePort&lt;/code&gt; trait is injectable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[async_trait]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;DomainStatePort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Send&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;Sync&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nn"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Debug&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;recommend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DomainRecommendation&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;observation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;DomainObservation&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The default implementation (&lt;code&gt;EwmaDomainState&lt;/code&gt;) is wired in automatically.&lt;br&gt;
&lt;a href="https://github.com/xberg-io/kreuzberg-cloud" rel="noopener noreferrer"&gt;kreuzberg-cloud&lt;/a&gt; replaces it with a distributed store for cross-instance domain intelligence.&lt;/p&gt;
&lt;h3&gt;
  
  
  Configuring dispatch
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;kreuzcrawl&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;
    &lt;span class="n"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DispatchProfile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EscalationStrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SimpleRetryPolicy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TomlClassifier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nn"&gt;DispatchProfile&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;.strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;EscalationStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;BypassThenBrowser&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;.retry_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;SimpleRetryPolicy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.with_max_retries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
            &lt;span class="nf"&gt;.waf_classifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;TomlClassifier&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;builtin&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
            &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  MCP server at CLI parity
&lt;/h2&gt;

&lt;p&gt;The MCP server now exposes tools 1:1 with the CLI — &lt;code&gt;scrape&lt;/code&gt;, &lt;code&gt;batch_scrape&lt;/code&gt;, &lt;code&gt;batch_crawl&lt;/code&gt;, &lt;code&gt;download&lt;/code&gt;, and &lt;code&gt;generate_citations&lt;/code&gt;. Earlier releases had partial coverage; v0.3.0 closes the gap.&lt;/p&gt;
&lt;h3&gt;
  
  
  Safety annotations
&lt;/h3&gt;

&lt;p&gt;Each tool declares three safety properties from the MCP spec:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;read_only&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;does not modify external state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;destructive&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;does not delete or overwrite anything&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;open_world&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;makes network requests to caller-specified URLs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;open_world: true&lt;/code&gt; is the meaningful one. MCP hosts can use it to apply additional&lt;br&gt;
sandboxing or prompt for confirmation before an agent makes outbound requests. The SSRF&lt;br&gt;
policy is the enforcement layer: a request to &lt;code&gt;http://169.254.169.254/&lt;/code&gt; returns a&lt;br&gt;
&lt;code&gt;SsrfPolicyViolation&lt;/code&gt; error before any network activity occurs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Transport
&lt;/h3&gt;

&lt;p&gt;The server runs in two modes depending on how it is invoked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;stdio&lt;/strong&gt; — reads JSON-RPC from stdin, writes to stdout. Used by Claude Desktop,
Cursor, and tools that spawn the binary as a subprocess.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamable HTTP&lt;/strong&gt; at &lt;code&gt;/mcp&lt;/code&gt; — used for service deployments. Enabled when the binary
is built with &lt;code&gt;--features api,mcp&lt;/code&gt;.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# stdio mode (subprocess)&lt;/span&gt;
kreuzcrawl mcp

&lt;span class="c"&gt;# HTTP mode&lt;/span&gt;
kreuzcrawl serve  &lt;span class="c"&gt;# exposes /mcp alongside the REST API&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Expanded CLI
&lt;/h2&gt;

&lt;p&gt;Four subcommands complete the CLI's 1:1 mapping with the core and MCP surfaces:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;batch-scrape &amp;lt;urls…&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scrape multiple URLs concurrently, emit structured JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;batch-crawl &amp;lt;urls…&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Crawl from multiple seed URLs with shared concurrency budget&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;download &amp;lt;url&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fetch and save assets to disk (PDF, DOCX, images, …)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;citations &amp;lt;url&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extract structured citations and references from a page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;version&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Print version and build metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Crawl two seeds, output Markdown&lt;/span&gt;
kreuzcrawl batch-crawl &lt;span class="se"&gt;\&lt;/span&gt;
  https://docs.example.com &lt;span class="se"&gt;\&lt;/span&gt;
  https://blog.example.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--depth&lt;/span&gt; 3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt; markdown
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Standalone robots.txt and sitemap parsers
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;kreuzcrawl::robots&lt;/code&gt; and &lt;code&gt;kreuzcrawl::sitemap&lt;/code&gt; are now public modules, usable without&lt;br&gt;
constructing a crawl engine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;kreuzcrawl&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;robots&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;parse_robots_txt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_path_allowed&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;kreuzcrawl&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;sitemap&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;parse_sitemap_xml&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parse_sitemap_index&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Standalone robots.txt check — both functions are infallible&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_robots_txt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;robots_body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Googlebot"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;allowed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;is_path_allowed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/private/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Standalone sitemap parse — infallible&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_sitemap_xml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sitemap_body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Sitemap index (points to child sitemaps)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_sitemap_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful for compliance tooling, link-graph builders, and crawl planners that need to evaluate &lt;code&gt;robots.txt&lt;/code&gt; access rules or enumerate URLs from a sitemap without running a full crawl.&lt;/p&gt;

&lt;h2&gt;
  
  
  Browser pool and executor injection
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;BrowserPool&lt;/code&gt;, &lt;code&gt;BrowserPoolConfig&lt;/code&gt;, &lt;code&gt;NativeBrowserExecutor&lt;/code&gt;, and&lt;br&gt;
&lt;code&gt;NativeBrowserExecutorConfig&lt;/code&gt; are now public. Callers that run many crawls against the&lt;br&gt;
same targets can construct and warm a pool once and reuse it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;kreuzcrawl&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;BrowserPool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BrowserPoolConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CrawlEngineBuilder&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;BrowserPool&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;BrowserPoolConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;// sync, returns Arc&amp;lt;BrowserPool&amp;gt;&lt;/span&gt;
&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="nf"&gt;.warm&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// pre-open Chrome tabs up to pool capacity&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;CrawlEngineBuilder&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.with_browser_pool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without pool injection, each engine creates and tears down its own Chrome instance. With pool injection, the browser process persists across crawl jobs — useful when you are running many short crawls in a tight loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;v0.3.0 adds two OpenTelemetry counters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Counter&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;crawl_waf_blocks_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Number of times a WAF fingerprint fired, labeled by vendor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;crawl_backend_escalations_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Number of tier escalations, labeled by source and target tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are emitted unconditionally via &lt;code&gt;opentelemetry::global&lt;/code&gt; — no feature gate required. Consumers that do not configure an OTel exporter incur no overhead beyond the counter increment.&lt;/p&gt;

&lt;p&gt;The WAF subsystem also gained property-based tests, &lt;code&gt;cargo-fuzz&lt;/code&gt; targets covering the TOML corpus loader and Aho-Corasick automaton, and Criterion benchmarks measuring classification throughput at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  API stability
&lt;/h2&gt;

&lt;p&gt;This is the first release kreuzcrawl declares stable. The commitments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kreuzcrawl&lt;/code&gt; crate public surface&lt;/strong&gt; is stable at MAJOR. Breaking changes will increment the major version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C FFI ABI&lt;/strong&gt; (&lt;code&gt;kreuzcrawl-ffi&lt;/code&gt;) is stable at MAJOR.MINOR. Struct layouts are frozen at MAJOR.MINOR boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dispatch enums&lt;/strong&gt; (&lt;code&gt;EscalationStrategy&lt;/code&gt;, &lt;code&gt;EscalationReason&lt;/code&gt;, &lt;code&gt;Tier&lt;/code&gt;, &lt;code&gt;CrawlError&lt;/code&gt;, &lt;code&gt;NetworkErrorKind&lt;/code&gt;) are &lt;code&gt;#[non_exhaustive]&lt;/code&gt;. New variants are non-breaking; callers outside the crate must include wildcard arms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generated binding packages&lt;/strong&gt; track the Rust core version. A binding at &lt;code&gt;0.3.x&lt;/code&gt; targets a Rust core at &lt;code&gt;0.3.x&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Upgrading from v0.2.0
&lt;/h2&gt;

&lt;p&gt;The public API surface is largely additive. Two changes require attention:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;CrawlError::WafBlocked&lt;/code&gt; is now a struct variant.&lt;/strong&gt; The previous unit variant becomes &lt;code&gt;CrawlError::WafBlocked { vendor, message }&lt;/code&gt;. Match arms that destructure it need updating:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before&lt;/span&gt;
&lt;span class="nn"&gt;CrawlError&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;WafBlocked&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* handle */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// After&lt;/span&gt;
&lt;span class="nn"&gt;CrawlError&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;WafBlocked&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;eprintln!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"blocked by {vendor}: {message}"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;SimpleRetryPolicy&lt;/code&gt; retry count is now exact.&lt;/strong&gt; The previous implementation had an off-by-one: &lt;code&gt;max_retries=3&lt;/code&gt; produced 2 retries. The API also changed: &lt;code&gt;new()&lt;/code&gt; now takes no arguments (defaults to 3 retries); use &lt;code&gt;.with_max_retries(n)&lt;/code&gt; to override. Update call sites that were compensating for the off-by-one or passing a count to &lt;code&gt;new()&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;v0.3.0 stabilises the core surface. The areas we are actively working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;HostMatcher&lt;/code&gt; in language bindings.&lt;/strong&gt; The &lt;code&gt;allowlist&lt;/code&gt; field on &lt;code&gt;SsrfPolicy&lt;/code&gt; is currently &lt;code&gt;#[alef(skip)]&lt;/code&gt; — the untagged-enum FFI representation is not yet finalized. Expect it in a 0.3.x patch once the tagged-enum form is decided.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proxy rotation provider.&lt;/strong&gt; The &lt;code&gt;ProxyProvider&lt;/code&gt; trait and &lt;code&gt;StaticProxyProvider&lt;/code&gt; are public in this release; per-request proxy selection and rotation are landing in 0.3.x.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expanded WAF corpus.&lt;/strong&gt; The 35-fingerprint TOML corpus is externally reloadable and accepting contributions. Open a PR with a fixture response and a fingerprint block.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://docs.kreuzcrawl.xberg.io" rel="noopener noreferrer"&gt;Documentation&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://docs.kreuzcrawl.xberg.io/CHANGELOG/" rel="noopener noreferrer"&gt;Changelog&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/xberg-io/kreuzcrawl" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://discord.gg/xt9WY3GnKR" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://docs.kreuzcrawl.xberg.io/concepts/ssrf-defense/" rel="noopener noreferrer"&gt;SSRF Defense reference&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://docs.kreuzcrawl.xberg.io/guides/mcp-server/" rel="noopener noreferrer"&gt;MCP Server guide&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://docs.kreuzcrawl.xberg.io/guides/streaming-crawl/" rel="noopener noreferrer"&gt;Streaming Crawls guide&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>SpacetimeDB and Reducers</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Wed, 08 Oct 2025 08:31:29 +0000</pubDate>
      <link>https://dev.to/kherld/spacetimedb-and-reducers-4jm3</link>
      <guid>https://dev.to/kherld/spacetimedb-and-reducers-4jm3</guid>
      <description>&lt;p&gt;SpacetimeDB is a transactional, in-memory relational database that lets you implement database-backed state machines via Rust reducers and table schemas. All data is stored in memory and persisted via an append-only Write-Ahead Log (WAL) called the Commitlog. Reducers are atomic, deterministic functions that execute inside the database, providing ACID guarantees.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://github.com/stk2chain/stk2eth" rel="noopener noreferrer"&gt;STK2ETH&lt;/a&gt; we use SpacetimeDB to store USSD menus, sessions, swaps, and audit logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tables and Schema
&lt;/h2&gt;

&lt;p&gt;Tables are declared using Rust structs annotated with the &lt;code&gt;#[table(name = &amp;lt;table&amp;gt;)]&lt;/code&gt; macro, which generates typed accessors. This provides compile-time guarantees about columns and helps reduce human errors in row operations.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[table(name&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;ussd_session,&lt;/span&gt; &lt;span class="nd"&gt;public)]&lt;/span&gt;
&lt;span class="nd"&gt;#[derive(Clone,&lt;/span&gt; &lt;span class="nd"&gt;Debug)]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;UssdSession&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;#[primary_key]&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ... other fields ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important tables in STK2ETH:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ussd_menu&lt;/code&gt;, &lt;code&gt;ussd_screen&lt;/code&gt;, &lt;code&gt;menu_item&lt;/code&gt;: represent static menu structure loaded from a file.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ussd_session&lt;/code&gt;: session rows for active interactions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;swap&lt;/code&gt;: holds pending/executed swaps and transaction metadata.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eth_audit_logs&lt;/code&gt;: immutable audit trail for regulatory needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reducer Semantics
&lt;/h2&gt;

&lt;p&gt;Reducers are Rust functions annotated with &lt;code&gt;#[reducer]&lt;/code&gt; that take a &lt;code&gt;&amp;amp;ReducerContext&lt;/code&gt; as the first argument. They are applied deterministically and atomically, and can only return &lt;code&gt;()&lt;/code&gt; or &lt;code&gt;Result&amp;lt;(), E&amp;gt;&lt;/code&gt; where &lt;code&gt;E: Display&lt;/code&gt;. This ensures state changes can be serialized consistently.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[reducer]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;set_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ReducerContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// reducer logic&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Guidelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep business logic testable: move pure logic (parsing, validation) into separate functions. Use reducers only for side-effects and DB writes.&lt;/li&gt;
&lt;li&gt;Use stable string error codes when returning &lt;code&gt;Err(String)&lt;/code&gt; from reducers to allow the UI to map errors to screens.&lt;/li&gt;
&lt;li&gt;Avoid long-running synchronous operations in reducers. Use a table (e.g., &lt;code&gt;ussd_request&lt;/code&gt;) to enqueue work for out-of-band workers when you need to call external services.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deterministic Design and Side-Effect Management
&lt;/h2&gt;

&lt;p&gt;Reducers must be deterministic and cannot perform direct network or filesystem I/O. External operations should be handled by enqueuing requests in a table, and a separate worker can process those requests and write results back to the database.&lt;/p&gt;

&lt;p&gt;Pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Insert a &lt;code&gt;ussd_request&lt;/code&gt; row describing the work.&lt;/li&gt;
&lt;li&gt;An off-chain worker reads the request, performs the external call, and inserts result rows for reducers to consume.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Testing Reducers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Unit test pure functions directly.&lt;/li&gt;
&lt;li&gt;Reducer tests should be small and deterministic. Use the SpacetimeDB test harness for reducers that require database access (when available).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example Patterns
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Validation:&lt;/strong&gt; Pure function &lt;code&gt;validate_amount_value&lt;/code&gt; + reducer wrapper &lt;code&gt;validate_amount&lt;/code&gt; that reads config, then returns &lt;code&gt;Ok(())&lt;/code&gt; or &lt;code&gt;Err("amount_too_low")&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enqueueing:&lt;/strong&gt; Reducers insert &lt;code&gt;ussd_request&lt;/code&gt; rows and return immediately.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;Reducers should log important events (e.g., session created, swap created, request enqueued) using SpacetimeDB’s logging system. This helps with debugging and auditing state transitions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;SpacetimeDB offers a deterministic and transactional pattern for implementing state machines. Treat reducers as the boundary for state changes, keep logic testable, enqueue external interactions for off-chain workers, and use stable error codes for UI mapping.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>spacetimedb</category>
      <category>stk2eth</category>
      <category>ethereum</category>
    </item>
    <item>
      <title>Memory-Mapped I/O for Handling Files Larger Than RAM</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Wed, 01 Oct 2025 12:10:35 +0000</pubDate>
      <link>https://dev.to/kherld/memory-mapped-io-for-handling-files-larger-than-ram-4o7k</link>
      <guid>https://dev.to/kherld/memory-mapped-io-for-handling-files-larger-than-ram-4o7k</guid>
      <description>&lt;p&gt;Building rfgrep required solving the challenge of processing files that exceed available RAM. Traditional file reading approaches become impractical when dealing with multi-gigabyte datasets. Memory-mapped I/O (mmap) provides a solution by mapping file contents directly into virtual memory, enabling file access without loading entire contents into physical RAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Limitations
&lt;/h2&gt;

&lt;p&gt;Traditional file reading methods encounter significant limitations with large files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;read_entire_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;io&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;read_to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Limitations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10GB file requires 10GB of RAM&lt;/li&gt;
&lt;li&gt;Out-of-memory errors with large datasets&lt;/li&gt;
&lt;li&gt;Performance degradation due to swapping&lt;/li&gt;
&lt;li&gt;Inability to process files exceeding available RAM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world scenario&lt;/strong&gt;: Processing 50GB log files on systems with 16GB RAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory-Mapped I/O
&lt;/h2&gt;

&lt;p&gt;Memory mapping enables file processing without loading entire contents into physical memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;memmap2&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MmapHandler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IoConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;MmapHandler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;RfgrepResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FileContent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;file_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.choose_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nn"&gt;ReadStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MemoryMapped&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.read_with_mmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nn"&gt;ReadStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Buffered&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.read_with_buffered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="nn"&gt;ReadStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Streaming&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.read_with_streaming&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;read_with_mmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;RfgrepResult&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FileContent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;File&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;FileContent&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;MemoryMapped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How Memory Mapping Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Virtual Memory Mapping
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  On-Demand Paging
&lt;/h3&gt;

&lt;p&gt;The operating system loads file pages into physical memory only when accessed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;search_region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Memory Usage Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File Size&lt;/th&gt;
&lt;th&gt;Traditional Read&lt;/th&gt;
&lt;th&gt;Memory Mapped&lt;/th&gt;
&lt;th&gt;Memory Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1GB&lt;/td&gt;
&lt;td&gt;1.0GB RAM&lt;/td&gt;
&lt;td&gt;~64MB RAM&lt;/td&gt;
&lt;td&gt;94%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10GB&lt;/td&gt;
&lt;td&gt;10.0GB RAM&lt;/td&gt;
&lt;td&gt;~64MB RAM&lt;/td&gt;
&lt;td&gt;99.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100GB&lt;/td&gt;
&lt;td&gt;Fails&lt;/td&gt;
&lt;td&gt;~64MB RAM&lt;/td&gt;
&lt;td&gt;Effective&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Access Time Comparison
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Instant&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;File&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Memory mapping established in: {:?}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="nf"&gt;.elapsed&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Adaptive Strategy Selection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;choose_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ReadStrategy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;file_size&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..=&lt;/span&gt;&lt;span class="mi"&gt;1_048_576&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;ReadStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Buffered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="mi"&gt;1_048_577&lt;/span&gt;&lt;span class="o"&gt;..=&lt;/span&gt;&lt;span class="mi"&gt;100_000_000_000&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;ReadStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;MemoryMapped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nn"&gt;ReadStrategy&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Streaming&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memory Pool Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MemoryPool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mappings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RwLock&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PathBuf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Mmap&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;current_usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AtomicUsize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;MemoryPool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;get_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Mmap&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IoError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.get_cached_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.create_new_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Zero-Copy String Processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;SliceProcessor&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SliceProcessor&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nv"&gt;'a&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.lines&lt;/span&gt;&lt;span class="nf"&gt;.iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;.enumerate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;.filter_map&lt;/span&gt;&lt;span class="p"&gt;(|(&lt;/span&gt;&lt;span class="n"&gt;line_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nn"&gt;memchr&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;memmem&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Match&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;line_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  File Content Abstraction
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;FileContent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;MemoryMapped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Mmap&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;Buffered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;Streaming&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BufReader&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;FileContent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;as_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nn"&gt;FileContent&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;MemoryMapped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt;&lt;span class="nf"&gt;.as_ref&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="nn"&gt;FileContent&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Buffered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nn"&gt;FileContent&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;Streaming&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Evaluation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Memory Efficiency
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"large_file.txt"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Access Performance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Traditional I/O&lt;/th&gt;
&lt;th&gt;Memory Mapped&lt;/th&gt;
&lt;th&gt;Improvement Factor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sequential Read&lt;/td&gt;
&lt;td&gt;2.3s&lt;/td&gt;
&lt;td&gt;0.1s&lt;/td&gt;
&lt;td&gt;23x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Random Access&lt;/td&gt;
&lt;td&gt;45.2s&lt;/td&gt;
&lt;td&gt;0.8s&lt;/td&gt;
&lt;td&gt;56x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple Files&lt;/td&gt;
&lt;td&gt;128.1s&lt;/td&gt;
&lt;td&gt;3.2s&lt;/td&gt;
&lt;td&gt;40x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Implementation Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Error Handling
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;.map_err&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nn"&gt;RfgrepError&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;IoError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"Memory mapping failed: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
    &lt;span class="p"&gt;)))&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Resource Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="nb"&gt;Drop&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;MmapHandler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.cleanup_mappings&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Limitations and Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Platform Differences
&lt;/h3&gt;

&lt;p&gt;Memory mapping behavior varies across operating systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux: Robust support for large mappings&lt;/li&gt;
&lt;li&gt;Windows: Different API and limitations&lt;/li&gt;
&lt;li&gt;macOS: Similar to Linux with some constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Safety Considerations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;mmap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nn"&gt;Mmap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Memory-mapped I/O enables rfgrep to process files of arbitrary size with minimal memory overhead. Key benefits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Files limited only by storage capacity, not RAM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt;: On-demand loading reduces memory footprint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Direct memory access patterns optimize speed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt;: Adaptive strategy selection based on file characteristics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation demonstrates how operating system virtual memory capabilities can be leveraged for efficient large-file processing without application-level memory management complexity.&lt;/p&gt;

&lt;p&gt;That’s all — happy tasking!&lt;/p&gt;

</description>
      <category>mmap</category>
      <category>io</category>
      <category>rust</category>
      <category>rfgrep</category>
    </item>
    <item>
      <title>How rfgrep Achieves Hardware Acceleration - SIMD-Optimized String Matching</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Fri, 26 Sep 2025 19:08:06 +0000</pubDate>
      <link>https://dev.to/kherld/how-rfgrep-achieves-hardware-acceleration-simd-optimized-string-matching-1cdd</link>
      <guid>https://dev.to/kherld/how-rfgrep-achieves-hardware-acceleration-simd-optimized-string-matching-1cdd</guid>
      <description>&lt;p&gt;When building rfgrep, a high-performance file search tool, I faced a fundamental challenge: how do you search through gigabytes of text at the speed of thought? The answer lies in leveraging SIMD (Single Instruction, Multiple Data) instructions that modern CPUs provide. In this post, I'll show you how rfgrep uses SIMD to achieve hardware-accelerated string matching that often outperforms traditional tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Traditional String Matching is Slow
&lt;/h2&gt;

&lt;p&gt;Most string search implementations use naive approaches or basic algorithms like Boyer-Moore. While these work well for small files, they don't scale to the massive datasets that modern applications need to process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;naive_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Vec&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.windows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="nf"&gt;.enumerate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;matches&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance Issues:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O(n*m) worst-case time complexity&lt;/li&gt;
&lt;li&gt;No hardware acceleration&lt;/li&gt;
&lt;li&gt;Memory bandwidth limited&lt;/li&gt;
&lt;li&gt;Poor cache utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Solution: SIMD with memchr
&lt;/h2&gt;

&lt;p&gt;rfgrep uses the &lt;code&gt;memchr&lt;/code&gt; crate, which provides SIMD-optimized string matching. This approach follows proven techniques used by high-performance tools like ripgrep, where SIMD is used to compare multiple bytes in parallel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;memchr&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;memmem&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;SimdSearch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u8&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;SimdSearch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;Self&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;Self&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.to_vec&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.pattern&lt;/span&gt;&lt;span class="nf"&gt;.is_empty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nd"&gt;vec!&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;text_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Vec&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;found_pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;memmem&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;text_bytes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.pattern&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;absolute_pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;found_pos&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;absolute_pos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;absolute_pos&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;text_bytes&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;matches&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How SIMD Works
&lt;/h2&gt;

&lt;p&gt;SIMD instructions allow a single CPU instruction to operate on multiple data elements simultaneously. For string matching:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Comparison&lt;/strong&gt;: Compare multiple bytes at once&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vectorized Operations&lt;/strong&gt;: Process 16, 32, or 64 bytes per instruction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware Acceleration&lt;/strong&gt;: Uses specialized CPU units&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Bandwidth&lt;/strong&gt;: Maximizes data throughput&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Performance Results
&lt;/h2&gt;

&lt;p&gt;Here's how rfgrep's SIMD implementation compares to traditional methods and established tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;1GB File&lt;/th&gt;
&lt;th&gt;10GB File&lt;/th&gt;
&lt;th&gt;Memory Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Naive Search&lt;/td&gt;
&lt;td&gt;45.2s&lt;/td&gt;
&lt;td&gt;452.1s&lt;/td&gt;
&lt;td&gt;2.1GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boyer-Moore&lt;/td&gt;
&lt;td&gt;12.8s&lt;/td&gt;
&lt;td&gt;128.3s&lt;/td&gt;
&lt;td&gt;1.8GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SIMD (rfgrep)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.2s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;32.1s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;64MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;14x faster&lt;/strong&gt; than naive search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4x faster&lt;/strong&gt; than Boyer-Moore&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;33x less memory&lt;/strong&gt; usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linear scaling&lt;/strong&gt; with file size&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Memory Layout Optimization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Optimized memory layout for SIMD&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;text_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;pattern_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// memchr handles alignment internally for optimal performance&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Chunked Processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;usize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 64KB chunks&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;search_chunked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;text_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Vec&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;chunk_offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text_bytes&lt;/span&gt;&lt;span class="nf"&gt;.chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;found_pos&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nn"&gt;memmem&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;find_iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.pattern&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="nf"&gt;.push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;found_pos&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;chunk_offset&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="nf"&gt;.len&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;matches&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced SIMD Techniques
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multi-Pattern Matching
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;search_multiple_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;HashMap&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;finder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;memmem&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Finder&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;finder&lt;/span&gt;&lt;span class="nf"&gt;.find_iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="nf"&gt;.insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Case-Insensitive Search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;search_case_insensitive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;usize&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;//real implementation would use SIMD-optimized case folding&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;text_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="nf"&gt;.to_lowercase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;pattern_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="nf"&gt;.to_lowercase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nn"&gt;memmem&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;find_iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_lower&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;pattern_lower&lt;/span&gt;&lt;span class="nf"&gt;.as_bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;

&lt;p&gt;In production use, rfgrep's SIMD implementation has shown excellent performance for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Log Analysis&lt;/strong&gt;: Large log files processed efficiently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Search&lt;/strong&gt;: Fast searching across large codebases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: Full-text search across documentation sets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Processing&lt;/strong&gt;: ETL pipelines with significant speed improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices for SIMD in Rust
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use Established Crates&lt;/strong&gt;: Leverage &lt;code&gt;memchr&lt;/code&gt; and other optimized libraries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Profile First&lt;/strong&gt;: Always measure performance before optimizing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consider Data Layout&lt;/strong&gt;: Optimize for sequential access patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk Processing&lt;/strong&gt;: Work with cache-friendly data chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback Strategies&lt;/strong&gt;: Maintain compatibility with non-SIMD hardware&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SIMD optimization in rfgrep demonstrates how modern hardware capabilities can dramatically improve software performance. By leveraging SIMD instructions through proven libraries like &lt;code&gt;memchr&lt;/code&gt;, we achieved significant performance gains while maintaining code simplicity and reliability.&lt;/p&gt;

&lt;p&gt;The key takeaway is to build upon established, optimized libraries rather than implementing low-level SIMD code from scratch, allowing you to focus on higher-level architecture decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also checkout
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.rs/memchr/" rel="noopener noreferrer"&gt;memchr crate documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kh3rld/rfgrep" rel="noopener noreferrer"&gt;rfgrep source code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rust-lang.github.io/packed_simd/packed_simd_2/" rel="noopener noreferrer"&gt;SIMD programming guides&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s all — happy tasking!&lt;/p&gt;

</description>
      <category>simd</category>
      <category>rust</category>
      <category>performance</category>
      <category>rfgrep</category>
    </item>
    <item>
      <title>Mastering Docker Image Management with GitHub Actions and Container Registries</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Tue, 28 Jan 2025 05:35:08 +0000</pubDate>
      <link>https://dev.to/kherld/mastering-docker-image-management-with-github-actions-and-container-registries-1lda</link>
      <guid>https://dev.to/kherld/mastering-docker-image-management-with-github-actions-and-container-registries-1lda</guid>
      <description>&lt;p&gt;&lt;strong&gt;Frens and colleagues often ask me, 'Kherld, how are you so efficient with your deployments???'&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
My answer??? It's all about automating the mundane and focusing on what truly matters. In this post, I'll show you how I leverage GitHub Actions and container registries to achieve seamless Docker image management... and how you can, too!!!  &lt;/p&gt;

&lt;p&gt;In modern software development, CI/CD isn't just a luxury; it's essential. Imagine automating your workflow to the point where you're sipping coffee while your code deploys itself. That's the magic of combining GitHub Actions with container registries for Docker image management.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Why GitHub Actions and Container Registries Matter&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3449zhrt8l6n165asl1c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3449zhrt8l6n165asl1c.png" alt=" " width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;GitHub Actions: Your CI/CD Sidekick&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;GitHub Actions isn’t just another automation tool; it’s your CI/CD sidekick, reacting to code pushes, pull requests, or even your coffee breaks (if you schedule it right). Seamlessly integrated with GitHub, it’s a no-brainer for teams already living in the GitHub ecosystem.&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Container Registries: The Image Vault&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Think of container registries like DockerHub or GitHub Container Registry (GHCR) as vaults for your Docker images. They keep your images safe, versioned, and ready for deployment from dev to prod, ensuring consistency across all your environments.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Common Challenges in Docker Image Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F531tqgfdf45g8u8ezhlf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F531tqgfdf45g8u8ezhlf.jpg" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manual Builds:&lt;/strong&gt; Because who doesn’t love spending hours on repetitive tasks?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tagging Chaos:&lt;/strong&gt; Managing tags can feel like herding cats.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Woes:&lt;/strong&gt; Trying to secure your registry can sometimes feel like solving a puzzle blindfolded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow Builds:&lt;/strong&gt; Watching paint dry might be faster than waiting for your images to build.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Breaking Down the Workflow&lt;/strong&gt;
&lt;/h3&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step 1: Setting Up GitHub Actions Workflow&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjs5drlw6rubalq6x19wo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjs5drlw6rubalq6x19wo.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Start by creating a &lt;code&gt;.github/workflows&lt;/code&gt; directory in your repository and defining a YAML workflow file. Here’s an example of a workflow that builds, tags, and pushes Docker images:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and Push Docker Image&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build-and-push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout Code&lt;/span&gt;
      &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Log in to GitHub Container Registry&lt;/span&gt;
      &lt;span class="c1"&gt;# Authenticate with GHCR to securely push images&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Docker Image&lt;/span&gt;
      &lt;span class="c1"&gt;# Build the image with a 'latest' tag&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker build -t ghcr.io/${{ github.repository }}/app:latest .&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Push Docker Image to GHCR&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker push ghcr.io/${{ github.repository }}/app:latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Step 2: Managing Secrets&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Store sensitive information, such as registry credentials, securely in GitHub Secrets. Navigate to your repository’s &lt;strong&gt;Settings &amp;gt; Secrets and variables &amp;gt; Actions&lt;/strong&gt; and add secrets like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;DOCKER_USERNAME&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DOCKER_PASSWORD&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For GitHub Container Registry, the &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; secret is automatically provided and scoped to your repository.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step 3: Implementing Advanced Tagging&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;To manage versioning effectively, use GitHub environment variables like &lt;code&gt;GITHUB_SHA&lt;/code&gt; and &lt;code&gt;GITHUB_REF&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Docker Image with Tags&lt;/span&gt;
  &lt;span class="c1"&gt;# Tag the image with both 'latest' and a unique commit SHA&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;IMAGE_NAME=ghcr.io/${{ github.repository }}/app&lt;/span&gt;
    &lt;span class="s"&gt;docker build -t $IMAGE_NAME:latest -t $IMAGE_NAME:${{ github.sha }} .&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Push Docker Images with Tags&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;docker push ghcr.io/${{ github.repository }}/app:latest&lt;/span&gt;
    &lt;span class="s"&gt;docker push ghcr.io/${{ github.repository }}/app:${{ github.sha }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Step 4: Speeding Up Builds with Caching&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Use Docker’s build cache to avoid rebuilding unchanged layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Docker Image with Cache&lt;/span&gt;
  &lt;span class="c1"&gt;# Reuse previous builds to save time&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;docker build \&lt;/span&gt;
      &lt;span class="s"&gt;--cache-from=ghcr.io/${{ github.repository }}/app:cache \&lt;/span&gt;
      &lt;span class="s"&gt;--tag ghcr.io/${{ github.repository }}/app:latest .&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Push Build Cache&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker push ghcr.io/${{ github.repository }}/app:cache&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Unique Challenges and Their Solutions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fauv0y1hi2bbpomyj55ig.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fauv0y1hi2bbpomyj55ig.jpg" alt=" " width="800" height="626"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authentication Failures:&lt;/strong&gt; Double-check secrets and ensure scopes are correctly configured. For GHCR, ensure the &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; has the necessary permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handling Rate Limits:&lt;/strong&gt; Use personal access tokens (PATs) with higher rate limits or set up organization-wide Docker Hub accounts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large Image Sizes:&lt;/strong&gt; Optimize Dockerfiles by using multi-stage builds, starting with minimal base images like &lt;code&gt;alpine&lt;/code&gt;, and removing unnecessary dependencies and artifacts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debugging Workflow Errors:&lt;/strong&gt; Use the debug logging level in GitHub Actions by setting &lt;code&gt;ACTIONS_STEP_DEBUG=true&lt;/code&gt; in repository secrets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Exploring Emerging Trends&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Software Bill of Materials (SBOM):&lt;/strong&gt; Knowing what’s in your software is the new cool. Tools like &lt;a href="https://github.com/anchore/syft" rel="noopener noreferrer"&gt;Syft&lt;/a&gt; and &lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;Trivy&lt;/a&gt; can generate SBOMs as part of your CI/CD pipeline, enhancing supply chain security.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OCI Compliance:&lt;/strong&gt; Ensuring your containers are as universally accepted as USB-C.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immutable Infrastructure:&lt;/strong&gt; Adopt containerized deployments to enable immutable infrastructure, reducing drift and ensuring consistency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Real-World Use Case&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I used GitHub Actions to build and push Docker images to both GitHub Container Registry and DockerHub for my project, &lt;a href="https://ghcr.io/kh3rld/travast-docker" rel="noopener noreferrer"&gt;Travast&lt;/a&gt;. A job portal platform built using Go, designed to connect job seekers and employers seamlessly. The automation pipeline streamlined the process, enabling my team to focus on development rather than manual deployments.&lt;/p&gt;

&lt;p&gt;By now, you should be ready to automate your Docker image management with GitHub Actions and container registries like a pro. Take these steps as a foundation, experiment, and innovate to tailor the process to your unique needs and aspirations. Say goodbye to manual labor and hello to a workflow where the machines work while you innovate. Ready to take the plunge? Start now, streamline your deployments, and watch your productivity soar.&lt;/p&gt;

&lt;p&gt;If you liked this post, you can &lt;a href="https://ko-fi.com/kherld" rel="noopener noreferrer"&gt;support me on ko-fi&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;References for Further Reading&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/actions" rel="noopener noreferrer"&gt;GitHub Actions Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hub.docker.com/" rel="noopener noreferrer"&gt;DockerHub Registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ghcr.io/" rel="noopener noreferrer"&gt;GitHub Container Registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anchore/syft" rel="noopener noreferrer"&gt;Syft - SBOM Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;Trivy - Security Scanning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>githubactions</category>
      <category>docker</category>
      <category>devops</category>
      <category>go</category>
    </item>
    <item>
      <title>Building a Programming Language from the Ground Up</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Thu, 26 Dec 2024 08:46:07 +0000</pubDate>
      <link>https://dev.to/kherld/building-a-programming-language-from-the-ground-up-3inc</link>
      <guid>https://dev.to/kherld/building-a-programming-language-from-the-ground-up-3inc</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zz44zvsmrfso1x4s7jo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zz44zvsmrfso1x4s7jo.png" alt=" " width="800" height="1035"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Designing and building a programming language is one of the most intellectually demanding and rewarding challenges in computer science. This document chronicles the journey of developing &lt;strong&gt;Kisumu&lt;/strong&gt;, a statically typed programming language inspired by Python’s simplicity, Go’s concurrency model, and Rust’s memory safety, all crafted using Go. It provides a deep dive into the technical nuances of language architecture, offering intuition for developers and enthusiasts alike.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Build a Programming Language?
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Addressing Existing Gaps
&lt;/h4&gt;

&lt;p&gt;While existing languages are powerful, they often have limitations or complexities that hinder developers. Kisumu aims to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simplify syntax&lt;/strong&gt; without compromising power.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offer robust concurrency models&lt;/strong&gt; for modern applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ensure safety and performance&lt;/strong&gt; through static typing and efficient garbage collection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Educational and Technical Growth
&lt;/h4&gt;

&lt;p&gt;Building a language from scratch is an opportunity to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deepen understanding of compilers, interpreters, and runtime environments.&lt;/li&gt;
&lt;li&gt;Contribute innovative ideas to the programming community.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Vision Behind Kisumu
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Target Audience
&lt;/h4&gt;

&lt;p&gt;Kisumu is designed for developers seeking a balance of simplicity, scalability, and performance in a general-purpose programming language.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Inspirations
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt;: Accessibility and readability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go&lt;/strong&gt;: Concurrency and scalability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rust&lt;/strong&gt;: Memory safety.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lua&lt;/strong&gt;: Lightweight embedded applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Development Stages
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lexer and Tokens&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first stage involves tokenizing source code. Tokens are the smallest elements of a program, such as keywords, identifiers, and symbols.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Token Layout&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;int&lt;/code&gt;: Keyword&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;=&lt;/code&gt;: Assignment operator&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;20&lt;/code&gt;: Literal&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Parser&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The parser converts tokens into an abstract syntax tree (AST), which represents the program's structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;int x = 20&lt;/code&gt; parses into:&lt;/li&gt;
&lt;li&gt;Variable Declaration Node&lt;/li&gt;
&lt;li&gt;Identifier: &lt;code&gt;x&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Value: &lt;code&gt;20&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Type Checking&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kisumu uses static typing to ensure type safety at compile-time by verifying operations and assignments for compatibility.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code Generation and Interpretation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The final stage translates the AST into executable instructions by either:

&lt;ul&gt;
&lt;li&gt;Generating bytecode for a virtual machine.&lt;/li&gt;
&lt;li&gt;Interpreting the AST directly.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Core Features of Kisumu
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Statically Typed&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every variable and function has a defined type known at compile-time, reducing runtime errors.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Concurrency Models&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inspired by Go, Kisumu supports:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Goroutines&lt;/strong&gt;: Lightweight threads for parallelism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Channels&lt;/strong&gt;: Safe communication between goroutines.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Modularity&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code organization through modules and packages ensures scalability and maintainability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Modern Error Handling&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flexible error propagation mechanisms include:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;try/catch&lt;/code&gt; blocks.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;?&lt;/code&gt; operator for concise error handling.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Interoperability&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Foreign Function Interface (FFI) allows integration with other languages like C or Go for performance-critical tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Challenges Faced
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Balancing Features and Simplicity&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issue&lt;/strong&gt;: Adding features like advanced type systems without complicating syntax.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Prioritize intuitive design and offer detailed documentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Efficient Memory Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issue&lt;/strong&gt;: Implementing a garbage collector that balances performance and safety.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Optimize garbage collection algorithms and provide clear developer guidelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Building a Robust Community&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issue&lt;/strong&gt;: Engaging users while Kisumu remains unreleased.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Create technical blogs and resources to showcase progress and attract early adopters.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Future Plans
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expanding the Standard Library&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Modules for networking, file handling, and advanced mathematical operations.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Generics and Metaprogramming&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduce generics for reusable functions and types, along with reflection for runtime program introspection.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;JIT Compilation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transition to Just-In-Time compilation for performance-critical applications.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Building Kisumu is not just about creating another programming language; it’s about exploring innovation in software development. This journey reflects the challenges and triumphs of crafting a tool that aims to empower developers with simplicity, safety, and scalability. &lt;/p&gt;

&lt;p&gt;Stay tuned as Kisumu evolves into a fully-fledged language, ready to inspire and support the next generation of software engineers. The project will be live at &lt;a href="https://github.com/Zone01-Kisumu-Open-Source-Projects" rel="noopener noreferrer"&gt;https://github.com/Zone01-Kisumu-Open-Source-Projects&lt;/a&gt;, where you can follow along and stay updated on our progress!!!&lt;/p&gt;

</description>
      <category>go</category>
      <category>kisumulanguage</category>
    </item>
    <item>
      <title>Key Takeaways and Highlights from DroidconKE and FlutterconKE</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Fri, 15 Nov 2024 15:42:08 +0000</pubDate>
      <link>https://dev.to/kherld/key-takeaways-and-highlights-from-droidconke-and-flutterconke-4cc5</link>
      <guid>https://dev.to/kherld/key-takeaways-and-highlights-from-droidconke-and-flutterconke-4cc5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby58yewdjzbdskkitouq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby58yewdjzbdskkitouq.png" alt="[Khalid Hussein](https://www.linkedin.com/in/kherldhussein/), [Cheryl Owala](https://www.linkedin.com/in/cheryl-owala-423731191/), and [Tomlee Abila](https://www.linkedin.com/in/tomlee-abila-4aa33a236/)" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
This year 6th-8th November, three of us, Khalid Hussein, Cheryl Owala, and Tomlee Abila had the incredible privilege of representing Kisumu Gophers at two of Africa’s biggest tech events &lt;a href="https://droidcon.co.ke/" rel="noopener noreferrer"&gt;DroidconKE&lt;/a&gt; and &lt;a href="https://flutterconke.dev/" rel="noopener noreferrer"&gt;FlutterconKE&lt;/a&gt;. If you’ve ever wondered what it’s like to be part of an event that brings together the sharpest minds in tech, the energy, the buzz, and the sheer scale of it all... let us just say, it was an experience like no other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 1: Community Vibes&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5wwglmuw9s5md4nip1vz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5wwglmuw9s5md4nip1vz.png" alt="DroidConKE &amp;amp;&amp;amp; FlutterConKE Community Day" width="800" height="653"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The first was all about &lt;strong&gt;community&lt;/strong&gt; and we couldn’t have asked for a better start. The day kicked off with a keynote from &lt;a href="https://x.com/TabithaKavyu" rel="noopener noreferrer"&gt;Tabitha Kavyu&lt;/a&gt; a Community Coordinator at Google Crowdsource, who set the tone by highlighting the importance of community-driven growth in the tech space. Tabitha’s talk was a powerful reminder of how communities like ours &lt;strong&gt;Kisumu Gophers&lt;/strong&gt; play a crucial role in shaping the future of technology by fostering collaboration and sharing knowledge.&lt;/p&gt;

&lt;p&gt;There’s something incredibly special about hearing inspiring stories from people like Kavyu who’ve been in the game for years. And when you're attending a conference with such a rich mix of both &lt;strong&gt;techies&lt;/strong&gt; and &lt;strong&gt;non-techies&lt;/strong&gt;, the conversations you have are gold.&lt;br&gt;
We met developers from all over Africa and beyond, many of whom were passionate about growing their local tech scenes, just like us. There were discussions about tackling challenges in community growth, creating opportunities for developers, and the power of open-source contributions by&lt;a href="https://x.com/Kai_mwanyumba" rel="noopener noreferrer"&gt; Brayan Kai&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 2: Android Deep Dive &amp;amp; Networking with Industry Leaders&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1twpojolxal1tozksz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1twpojolxal1tozksz.png" alt="Tomlee Abila &amp;amp;&amp;amp; Paul Onakoya" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The second day was packed with deep technical sessions, starting with the keynote on “What’s New in Android” that set the stage for a dive into the latest Android technologies. As Android developers ourselves, we were excited to hear about Jetpack Compose, Kotlin updates, and the upcoming Android 14 features from &lt;a href="https://www.linkedin.com/in/paulonakoya/" rel="noopener noreferrer"&gt;Paul Onakoya&lt;/a&gt; a Lead Android Engineering team at Google Nairobi.&lt;/p&gt;

&lt;p&gt;What made the day even more exciting were the workshops. We participated in a hands-on Advanced Layout Animations in Compose with Shared Elements by &lt;a href="https://x.com/riggaroo" rel="noopener noreferrer"&gt;Rebecca Franks&lt;/a&gt; an Android Developer Relations Engineer, where we explored new ways to streamline Android development and create more efficient animations with Jetpack Compose.But the real highlight of Day 2 was the networking; having the opportunity to speak directly with Android developers from other countries. Whether it was over coffee or during lunch breaks, the conversations flowed, and we were able to exchange ideas, discuss challenges, and even brainstorm potential collaborations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day 3: Jetpack Compose &amp;amp; Custom Layouts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By the third day, we were fully immersed in the technical side of things. The keynote on custom layouts and graphics in Jetpack Compose was a game changer. We’ve always been excited about Compose, but this session gave us a whole new perspective on how it can be used to build dynamic, responsive UIs without the complexity of traditional XML layouts.&lt;/p&gt;

&lt;p&gt;The hands-on workshops were the perfect opportunity to apply what we’d just learned. We experimented with custom layouts, and advanced graphics, and even started building UIs we had only dreamed of before. It was a day of learning, experimenting, and sharing ideas with fellow developers, and we left with fresh insights on how we could improve the apps we build.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ueba8fht31kj34z9694.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ueba8fht31kj34z9694.png" alt="Gophers Kisumu &amp;amp; Nairobi Team" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Leaving DroidconKE and FlutterconKE, we couldn’t help but feel inspired and motivated. The ideas, insights, and connections we gained during the event have already begun to shape the next steps for Kisumu Gophers. We’re more determined than ever to continue growing the tech community in Kisumu, hosting workshops, hackathons, and meetups, and collaborating with developers across Africa and beyond.&lt;/p&gt;

&lt;p&gt;The energy we experienced in Nairobi is something we plan to bring back to Kisumu and beyond. We’re excited about the future of tech in our city and the potential for Kisumu Gophers to be part of the global tech movement. If anything, DroidconKE and FlutterconKE confirmed one thing: the future is bright, and we’re ready to be a part of it.&lt;/p&gt;

&lt;p&gt;A big thank you to the organizers, speakers, and everyone who made DroidconKE and FlutterconKE 2024 an unforgettable experience. We left the event with new knowledge, new friends, and a renewed sense of purpose. It was a reminder of why we do what we do: to make a difference, to grow, and to connect with like-minded individuals who share the same passion for technology.&lt;/p&gt;

&lt;p&gt;If you’re considering attending next year, don’t hesitate. The experiences and connections you’ll make are priceless. We’re already looking forward to the next event and hope to see you there!&lt;/p&gt;

&lt;p&gt;Stay connected with &lt;a href="https://x.com/GophersKisumu" rel="noopener noreferrer"&gt;Kisumu Gophers&lt;/a&gt; and &lt;a href="https://www.zone01kisumu.ke/" rel="noopener noreferrer"&gt;Zone01Kisumu&lt;/a&gt; on social media to learn more about our upcoming events, workshops, and opportunities. Whether you’re a developer or someone interested in the tech world, there’s always room for you in our community.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
