<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Xberg.io</title>
    <description>The latest articles on DEV Community by Xberg.io (xberg-io).</description>
    <link>https://dev.to/xberg-io</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F13825%2F170e1b02-6519-4a49-a230-f659aae24e77.png</url>
      <title>DEV Community: Xberg.io</title>
      <link>https://dev.to/xberg-io</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xberg-io"/>
    <language>en</language>
    <item>
      <title>Introducing Crawlberg v1.0.0</title>
      <dc:creator>Khalid Hussein</dc:creator>
      <pubDate>Mon, 29 Jun 2026 03:53:27 +0000</pubDate>
      <link>https://dev.to/xberg-io/introducing-crawlberg-v100-k7n</link>
      <guid>https://dev.to/xberg-io/introducing-crawlberg-v100-k7n</guid>
      <description>&lt;p&gt;We're upgrading Crawlberg to a new version: Crawlberg v1.0.0. It builds on the previous kreuzcrawl. It declares the public API frozen under the new project name. All technical features below shipped in v0.3.0 (2026-06-23); v1.0.0 is a stability declaration and rename, not a new feature release.&lt;/p&gt;

&lt;p&gt;The four production-facing changes most likely to require operational action:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Package and env var rename&lt;/strong&gt; - every artifact identifier has changed; see the migration table.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSRF defense is now on by default&lt;/strong&gt; - internal crawl targets (localhost, RFC 1918, cloud metadata) will fail without &lt;code&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK=1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CrawlError::WafBlocked&lt;/code&gt; is now a struct variant&lt;/strong&gt; - exhaustive match arms will not compile until updated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;max_retries&lt;/code&gt; semantics changed&lt;/strong&gt; - off-by-one fixed; &lt;code&gt;max_retries=3&lt;/code&gt; now produces exactly 3 retries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Precompiled binaries cover Linux (x86_64/aarch64), macOS (ARM64 and x86_64), and Windows x64. Homebrew bottles and Docker images on GHCR are also available.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Crawlberg?
&lt;/h2&gt;

&lt;p&gt;Crawlberg is a web crawling engine written primarily in Rust that exposes a single consistent API across 14 language runtimes. It handles HTTP transport, JavaScript rendering, robots.txt compliance, per-domain rate limiting, SSRF safety, and structured extraction. Extension points (&lt;code&gt;Frontier&lt;/code&gt;, &lt;code&gt;RateLimiter&lt;/code&gt;, &lt;code&gt;CrawlStore&lt;/code&gt;, &lt;code&gt;EventEmitter&lt;/code&gt;, &lt;code&gt;ContentFilter&lt;/code&gt;, &lt;code&gt;WafClassifier&lt;/code&gt;, &lt;code&gt;ProxyProvider&lt;/code&gt;) are injectable traits; wire in your own frontier, storage backend, or proxy pool without forking the engine.&lt;/p&gt;

&lt;p&gt;A single &lt;code&gt;scrape()&lt;/code&gt; call returns text, metadata, links, images, assets, JSON-LD, Open Graph tags, hreflang, favicons, headings, response headers, and clean HTML→Markdown. When a site requires JavaScript, the optional headless browser tier handles it transparently.&lt;/p&gt;

&lt;p&gt;v1.0.0 promotes v1.0.0-rc.2 and freezes the public API under the new project name. The features described in the sections below represent the platform that 1.0.0 declares stable; they shipped in v0.3.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  What v1.0.0 Declares Stable
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;These capabilities shipped in v0.3.0 (2026-06-23). v1.0.0 freezes their API and declares them production-stable under the new &lt;code&gt;crawlberg&lt;/code&gt; package name. Engineers running 0.3.0 already have the runtime features; upgrading to 1.0.0 means: rename packages, update env vars, get the stable API contract.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Project rename: &lt;code&gt;kreuzcrawl&lt;/code&gt; → &lt;code&gt;crawlberg&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The most operationally significant change is the rename. Every artifact identifier has changed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Artifact&lt;/th&gt;
&lt;th&gt;Old&lt;/th&gt;
&lt;th&gt;New&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Crate (crates.io)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kreuzcrawl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;crawlberg&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kreuzcrawl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;crawlberg&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@kreuzberg/crawl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@xberg-io/crawlberg&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kreuzberg/kreuzcrawl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;xberg-io/crawlberg&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maven groupId&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dev.kreuzberg&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;io.xberg.crawlberg&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NuGet&lt;/td&gt;
&lt;td&gt;&lt;code&gt;KreuzbergDev.KreuzCrawl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;XbergIo.Crawlberg&lt;/code&gt; &lt;em&gt;(see note below)&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go module&lt;/td&gt;
&lt;td&gt;&lt;code&gt;github.com/kreuzberg-dev/kreuzcrawl/...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;github.com/xberg-io/crawlberg/packages/go&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C FFI symbol prefix&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kcrawl_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cberg_*&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Env vars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;KREUZBERG_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CRAWLBERG_*&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docs&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docs.kreuzcrawl.kreuzberg.dev&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docs.crawlberg.xberg.io&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Behavior and API shape are identical. This is a rename, not a rewrite.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tiered dispatch engine
&lt;/h3&gt;

&lt;p&gt;The crawl engine chains HTTP → bypass → headless browser, driven by per-attempt signals rather than a static configuration flag. When a response indicates a WAF challenge, the engine escalates; when it succeeds, it records the outcome in per-domain state and adjusts the starting tier for subsequent visits.&lt;/p&gt;

&lt;p&gt;Public types: &lt;code&gt;Tier&lt;/code&gt;, &lt;code&gt;EscalationStrategy&lt;/code&gt;, &lt;code&gt;EscalationReason&lt;/code&gt;, &lt;code&gt;AttemptOutcome&lt;/code&gt;, &lt;code&gt;RetryDirective&lt;/code&gt;, &lt;code&gt;RetryPolicy&lt;/code&gt;, &lt;code&gt;WafSignal&lt;/code&gt;, &lt;code&gt;DispatchProfile&lt;/code&gt;. All dispatch enums are &lt;code&gt;#[non_exhaustive]&lt;/code&gt; — future tiers are non-breaking additions.&lt;/p&gt;

&lt;h3&gt;
  
  
  WAF detection with hot-reload fingerprints
&lt;/h3&gt;

&lt;p&gt;A TOML fingerprint corpus (&lt;code&gt;rules/waf_fingerprints.toml&lt;/code&gt;, 34 fingerprints) feeds an Aho-Corasick automaton. &lt;code&gt;TomlClassifier::watch()&lt;/code&gt; watches the file with a debounced watcher and swaps the compiled automaton atomically via &lt;code&gt;ArcSwap&lt;/code&gt; — no process restart needed. This is safe for Kubernetes ConfigMap updates: mount the TOML as a ConfigMap volume, edit it, and the running engine picks up the new corpus within seconds.&lt;/p&gt;

&lt;p&gt;Per-domain block rates are tracked with &lt;code&gt;EwmaDomainState&lt;/code&gt;, an exponentially weighted moving average that automatically promotes or demotes the starting tier based on recent history.&lt;/p&gt;

&lt;h3&gt;
  
  
  SSRF defense, on by default
&lt;/h3&gt;

&lt;p&gt;Every fetch path runs URL validation before the network call and after each redirect hop. Blocked address ranges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;127.0.0.0/8&lt;/code&gt; (loopback)&lt;/li&gt;
&lt;li&gt;RFC 1918 private ranges (&lt;code&gt;10.0.0.0/8&lt;/code&gt;, &lt;code&gt;172.16.0.0/12&lt;/code&gt;, &lt;code&gt;192.168.0.0/16&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;169.254.0.0/16&lt;/code&gt; (link-local, including cloud metadata endpoints such as &lt;code&gt;169.254.169.254&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.0.0.0/8&lt;/code&gt; (this-network/reserved per RFC 1122 §3.2.1.3)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;224.0.0.0/4&lt;/code&gt; (multicast)&lt;/li&gt;
&lt;li&gt;IPv6 ULA &lt;code&gt;fc00::/7&lt;/code&gt;, link-local &lt;code&gt;fe80::/10&lt;/code&gt;, multicast &lt;code&gt;ff00::/8&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Any non-&lt;code&gt;http(s)&lt;/code&gt; scheme&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three protection layers work together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DNS-rebinding mitigation&lt;/strong&gt;: every resolved IP must pass the policy, not just the hostname at call time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redirect-chain re-validation&lt;/strong&gt;: each hop re-resolves and re-validates, bounded by &lt;code&gt;ssrf.max_redirects&lt;/code&gt; (default 5).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Link-enqueue validation&lt;/strong&gt;: URLs are validated against the SSRF policy before being added to the crawl frontier, not only at fetch time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Allowlisting is available via &lt;code&gt;HostMatcher&lt;/code&gt; (&lt;code&gt;Exact&lt;/code&gt;/&lt;code&gt;Suffix&lt;/code&gt;/&lt;code&gt;Cidr&lt;/code&gt; variants). Opt out entirely with &lt;code&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK=1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory-bounded streaming crawl
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;crawl_stream()&lt;/code&gt; and &lt;code&gt;batch_crawl_stream()&lt;/code&gt; previously accumulated every &lt;code&gt;CrawlEvent::Page&lt;/code&gt; in memory. They now yield each page and drop it immediately. Based on internal measurements documented in the changelog, peak working-set drops from approximately 2.5 GB to approximately 20 MB on large crawls. The batch &lt;code&gt;crawl()&lt;/code&gt; API; which returns all pages at once; is unchanged.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP server and full CLI parity
&lt;/h3&gt;

&lt;p&gt;The CLI exposes &lt;code&gt;batch-scrape&lt;/code&gt;, &lt;code&gt;batch-crawl&lt;/code&gt;, &lt;code&gt;download&lt;/code&gt;, &lt;code&gt;citations&lt;/code&gt;, and &lt;code&gt;version&lt;/code&gt;; 1:1 with the core and MCP surfaces. The MCP server serves tools over both stdio and rmcp Streamable HTTP at &lt;code&gt;/mcp&lt;/code&gt;. HTTP transport requires the binary to be compiled with the &lt;code&gt;api&lt;/code&gt; and &lt;code&gt;mcp&lt;/code&gt; Cargo features; the release CLI binary includes both. Each tool carries &lt;code&gt;read_only&lt;/code&gt;/&lt;code&gt;destructive&lt;/code&gt;/&lt;code&gt;open_world&lt;/code&gt; safety annotations for agent orchestration frameworks that need to reason about side effects before calling tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Public substrate parsers
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;crawlberg::robots&lt;/code&gt; and &lt;code&gt;crawlberg::sitemap&lt;/code&gt; are now public modules, usable without spinning up the full crawl engine. &lt;code&gt;parse_robots_txt&lt;/code&gt;, &lt;code&gt;is_path_allowed&lt;/code&gt;, &lt;code&gt;RobotsRules&lt;/code&gt;, &lt;code&gt;parse_sitemap_xml&lt;/code&gt;, &lt;code&gt;parse_sitemap_index&lt;/code&gt;, and &lt;code&gt;is_sitemap_index&lt;/code&gt; are all available standalone; useful for robots/sitemap preprocessing in pipelines that manage their own fetch layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Technical Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Escalation budget injection
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;EscalationBudget&lt;/code&gt; is a user-injectable trait. You can implement per-domain, per-hour browser budget caps, or tie escalation policy to real-time proxy cost signals. The built-in &lt;code&gt;EwmaDomainState&lt;/code&gt; is designed for zero-configuration deployment; the trait interface is designed for when you have stronger opinions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lock-free corpus hot-reload
&lt;/h3&gt;

&lt;p&gt;The WAF automaton lives under an &lt;code&gt;ArcSwap&lt;/code&gt;. Readers take a guard for the duration of a single classification call; on the order of microseconds; and never contend with the writer. The writer side compiles a new automaton (tens of milliseconds for a large corpus) and swaps it in a single atomic store. In-flight requests complete against the old automaton; new requests use the new one immediately. Readers never block on corpus updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  SSRF at every redirect hop
&lt;/h3&gt;

&lt;p&gt;URL validation at the call site alone is insufficient: a hostname can pass the initial check, then DNS resolves to a private address after a short TTL expires (DNS rebinding). Crawlberg re-resolves and re-validates at every redirect, bounded by &lt;code&gt;ssrf.max_redirects&lt;/code&gt; (default 5). The &lt;code&gt;SsrfPolicy::from_env&lt;/code&gt; serde default means &lt;code&gt;CrawlConfig&lt;/code&gt; deserialized from JSON automatically honors the environment variable; important for container deployments where env vars are the primary configuration channel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Browser pool lifecycle
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;BrowserPool&lt;/code&gt; is public. Construct and &lt;code&gt;warm(n)&lt;/code&gt; the pool at startup; pass it via &lt;code&gt;CrawlEngineBuilder::with_browser_pool()&lt;/code&gt;. Browser instances are reused across crawl jobs rather than spawned per escalation event. &lt;code&gt;CrawlEngineHandle::from_engine()&lt;/code&gt; produces a cloneable handle, so multiple async tasks can share a single engine and pool without additional coordination.&lt;/p&gt;

&lt;h3&gt;
  
  
  Asset downloads through the SSRF filter
&lt;/h3&gt;

&lt;p&gt;Before this release, &lt;code&gt;download_documents&lt;/code&gt; was honored only by single-page &lt;code&gt;scrape()&lt;/code&gt;; the crawl loop fetched, flagged, and discarded the bytes. Downloads now route through &lt;code&gt;http_fetch&lt;/code&gt;; the same transport as page fetches; so every file download is subject to the SSRF policy and per-domain rate limiting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Implications
&lt;/h2&gt;

&lt;p&gt;The most directly measurable change is streaming memory: ~2.5 GB → ~20 MB peak working-set on large crawls (figures from the changelog; no external benchmark suite has been published for this release). The practical implication is that crawl corpus size is no longer bounded by available RAM when using &lt;code&gt;crawl_stream&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Throughput: the tiered dispatch model adds a small latency overhead for requests that escalate; one additional HTTP probe before browser spin-up. Domains that respond normally to plain HTTP never pay this cost. EWMA per-domain state promotes well-behaved domains to start at the HTTP tier, avoiding unnecessary bypass or browser escalation for clean domains.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ArcSwap&lt;/code&gt;-backed corpus reload is lock-free from the reader's perspective, so fingerprint corpus updates do not introduce latency spikes in production.&lt;/p&gt;

&lt;p&gt;No benchmark numbers for throughput, requests-per-second, or latency percentiles are published in this release. Teams evaluating Crawlberg for high-throughput workloads should run their own benchmarks against the stable 1.0.0 surface; the Criterion benchmarks in the repository cover the WAF subsystem and are a starting point for extending coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Language Bindings Spotlight
&lt;/h2&gt;

&lt;p&gt;All 14 bindings are generated from the same Rust core by &lt;a href="https://github.com/xberg-io/alef" rel="noopener noreferrer"&gt;alef&lt;/a&gt; and contain no per-language extraction logic. The code snippets below are illustrative; check the &lt;a href="https://github.com/xberg-io/crawlberg/tree/main/packages" rel="noopener noreferrer"&gt;per-language READMEs&lt;/a&gt; for exact API signatures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crawlberg&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;crawl_stream&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_pages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Single-page extraction
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Memory-bounded streaming crawl
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;crawl_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;crawlberg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JavaScript / Node.js
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;crawlStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;CrawlConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@xberg-io/crawlberg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;maxDepth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxPages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nf"&gt;crawlStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @xberg-io/crawlberg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PHP
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;XbergIo\Crawlberg\CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;XbergIo\Crawlberg\Crawlberg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CrawlConfig&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withMaxDepth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withMaxPages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withConcurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crawlberg&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'https://example.com'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$config&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require xberg-io/crawlberg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PHP 8.2, 8.3, and 8.4 are supported; precompiled NTS extensions ship for Linux (glibc, aarch64/x86_64), macOS (ARM64/x86_64), and Windows (VS16/VS17 x86_64).&lt;/p&gt;

&lt;h2&gt;
  
  
  Breaking Changes / Compatibility Notes
&lt;/h2&gt;

&lt;p&gt;v1.0.0 is a breaking release for users of pre-release &lt;code&gt;kreuzcrawl&lt;/code&gt; / &lt;code&gt;kreuzberg&lt;/code&gt;-namespaced packages. For users already on the &lt;code&gt;crawlberg&lt;/code&gt; name from 0.x pre-releases, the behavioral breaking changes are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Old behavior&lt;/th&gt;
&lt;th&gt;New behavior&lt;/th&gt;
&lt;th&gt;Action required&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Package identifiers&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;kreuzcrawl&lt;/code&gt; everywhere&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crawlberg&lt;/code&gt; / &lt;code&gt;@xberg-io/crawlberg&lt;/code&gt; etc.&lt;/td&gt;
&lt;td&gt;Update dependency declarations in all manifests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Env vars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;KREUZBERG_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CRAWLBERG_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Update shell configs, CI env blocks, K8s Secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C FFI symbols&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kcrawl_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cberg_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Recompile; update header includes and linker references&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go module path&lt;/td&gt;
&lt;td&gt;&lt;code&gt;github.com/kreuzberg-dev/kreuzcrawl/...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;github.com/xberg-io/crawlberg/packages/go&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;go get&lt;/code&gt; new path; update all import statements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CrawlError::WafBlocked&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Unit variant&lt;/td&gt;
&lt;td&gt;Struct variant &lt;code&gt;{ vendor, message }&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Update match arms to destructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NetworkErrorKind&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Exhaustive enum&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;#[non_exhaustive]&lt;/code&gt; applied&lt;/td&gt;
&lt;td&gt;Add wildcard &lt;code&gt;_&lt;/code&gt; arms to exhaustive matches; recompile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CrawlError&lt;/code&gt; / dispatch enums&lt;/td&gt;
&lt;td&gt;Exhaustive enums&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;#[non_exhaustive]&lt;/code&gt; applied&lt;/td&gt;
&lt;td&gt;Add wildcard &lt;code&gt;_&lt;/code&gt; arms to exhaustive matches; recompile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SimpleRetryPolicy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;max_retries=3&lt;/code&gt; → 2 actual retries (off-by-one)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;max_retries=3&lt;/code&gt; → 3 actual retries (fixed)&lt;/td&gt;
&lt;td&gt;Audit retry budgets if behavior depended on the old count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DomainStatePort&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Mutation model&lt;/td&gt;
&lt;td&gt;Observation model (&lt;code&gt;recommend&lt;/code&gt;/&lt;code&gt;observe&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Update trait implementations if you implemented this trait&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSRF policy&lt;/td&gt;
&lt;td&gt;Disabled by default&lt;/td&gt;
&lt;td&gt;Enabled by default&lt;/td&gt;
&lt;td&gt;Add &lt;code&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK=1&lt;/code&gt; for internal crawls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Upgrade Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step-by-step for existing &lt;code&gt;kreuzcrawl&lt;/code&gt; users
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Update package declarations&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Python&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;crawlberg        &lt;span class="c"&gt;# replaces kreuzcrawl&lt;/span&gt;

&lt;span class="c"&gt;# Node.js&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; @xberg-io/crawlberg   &lt;span class="c"&gt;# replaces @kreuzberg/crawl&lt;/span&gt;

&lt;span class="c"&gt;# PHP&lt;/span&gt;
composer require xberg-io/crawlberg   &lt;span class="c"&gt;# replaces kreuzberg/kreuzcrawl&lt;/span&gt;

&lt;span class="c"&gt;# Go&lt;/span&gt;
go get github.com/xberg-io/crawlberg/packages/go@v1.0.0

&lt;span class="c"&gt;# Rust&lt;/span&gt;
cargo add crawlberg@1.0.0    &lt;span class="c"&gt;# replaces kreuzcrawl&lt;/span&gt;

&lt;span class="c"&gt;# C# - verify the exact package ID in the C# README before running&lt;/span&gt;
dotnet add package XbergIo.Crawlberg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Update environment variables and configuration&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Old&lt;/span&gt;
&lt;span class="nv"&gt;KREUZBERG_ALLOW_PRIVATE_NETWORK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="c"&gt;# New&lt;/span&gt;
&lt;span class="nv"&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Audit SSRF settings before first run&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you crawl internal networks (CI test targets, internal APIs, localhost), set &lt;code&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK=1&lt;/code&gt; before upgrading. Without it, requests to RFC 1918 and loopback addresses will fail with &lt;code&gt;CrawlError::SsrfPolicyViolation&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Update C FFI call sites&lt;/strong&gt; &lt;em&gt;(if applicable)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Replace all &lt;code&gt;kcrawl_&lt;/code&gt; symbol references with &lt;code&gt;cberg_&lt;/code&gt;. Regenerate cbindgen headers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Fix &lt;code&gt;CrawlError::WafBlocked&lt;/code&gt; match arms&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Old - unit variant&lt;/span&gt;
&lt;span class="nn"&gt;CrawlError&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;WafBlocked&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// New - struct variant&lt;/span&gt;
&lt;span class="nn"&gt;CrawlError&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;WafBlocked&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;eprintln!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"WAF block: {vendor}: {message}"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;6. Add wildcard arms for &lt;code&gt;#[non_exhaustive]&lt;/code&gt; enums&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CrawlError&lt;/code&gt;, &lt;code&gt;NetworkErrorKind&lt;/code&gt;, and all dispatch enums (&lt;code&gt;EscalationReason&lt;/code&gt;, &lt;code&gt;EscalationStrategy&lt;/code&gt;, etc.) are now &lt;code&gt;#[non_exhaustive]&lt;/code&gt;. Any exhaustive &lt;code&gt;match&lt;/code&gt; on these types will fail to compile until a wildcard &lt;code&gt;_ =&amp;gt; { ... }&lt;/code&gt; arm is added.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification checklist&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;code&gt;crawlberg --version&lt;/code&gt; prints &lt;code&gt;1.0.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;scrape()&lt;/code&gt; returns a result for a known public URL&lt;/li&gt;
&lt;li&gt;[ ] A streaming crawl over a multi-page site completes without OOM&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;CrawlError::WafBlocked&lt;/code&gt; match arms compile (struct variant)&lt;/li&gt;
&lt;li&gt;[ ] All &lt;code&gt;CRAWLBERG_*&lt;/code&gt; env vars are present in CI/CD&lt;/li&gt;
&lt;li&gt;[ ] No residual &lt;code&gt;KREUZBERG_*&lt;/code&gt; vars shadow the new names in your process environment&lt;/li&gt;
&lt;li&gt;[ ] C FFI header compilation succeeds with &lt;code&gt;cberg_&lt;/code&gt; symbol names&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rollback&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Previous &lt;code&gt;kreuzcrawl&lt;/code&gt; packages remain published at their last version. Pin your dependency to the last &lt;code&gt;kreuzcrawl&lt;/code&gt; version and revert env vars if you need to roll back while investigating issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational Guidance for Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tuning recommendations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency&lt;/strong&gt;: Start at 10–20 per domain. The global concurrency cap and per-domain rate limiter are enforced independently; the global cap prevents resource exhaustion, the per-domain limiter prevents hammering individual targets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depth and page limits&lt;/strong&gt;: Set &lt;code&gt;max_depth&lt;/code&gt; and &lt;code&gt;max_pages&lt;/code&gt; conservatively. Even with &lt;code&gt;crawl_stream&lt;/code&gt;'s bounded memory, an uncapped frontier can grow large on deeply-linked sites.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry budget&lt;/strong&gt;: &lt;code&gt;max_retries&lt;/code&gt; is now exact (off-by-one fixed). If your configuration relied on the old count, validate your retry/backoff math before deploying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser pool pre-warming&lt;/strong&gt;: Call &lt;code&gt;BrowserPool::warm(n)&lt;/code&gt; at startup. Lazy browser spin-up adds significant latency to the first escalated request per domain; pre-warming eliminates that spike.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proxy rotation&lt;/strong&gt;: Implement &lt;code&gt;ProxyProvider&lt;/code&gt; for production anti-blocking workloads. The trait is async; you can call an external proxy API per request without blocking the crawl loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Wire &lt;code&gt;crawlberg_waf_fingerprint_matches_total&lt;/code&gt; and &lt;code&gt;crawlberg_escalations_total&lt;/code&gt; into your metrics system. These counters are available via the OpenTelemetry integration; enabling OTLP export requires a Cargo feature — check the crate's &lt;code&gt;Cargo.toml&lt;/code&gt; for the exact feature name before adding it to your dependency declaration.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;RUST_LOG=crawlberg=info&lt;/code&gt; for structured tracing output in production; &lt;code&gt;=debug&lt;/code&gt; for request-level detail.&lt;/li&gt;
&lt;li&gt;Treat &lt;code&gt;CrawlError::SsrfPolicyViolation&lt;/code&gt; as a security event; log the violating URL and source.&lt;/li&gt;
&lt;li&gt;Track the ratio of HTTP-tier successes to browser-tier escalations per domain. A sustained high escalation ratio for a domain you expect to be cooperative signals a misconfigured fingerprint corpus or an actual WAF rollout.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Failure modes and mitigations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Rate-limited by target&lt;/td&gt;
&lt;td&gt;HTTP 429, escalation signal indicating rate limiting (check &lt;code&gt;EscalationReason&lt;/code&gt; type docs for the exact variant)&lt;/td&gt;
&lt;td&gt;Increase &lt;code&gt;rate_limit_delay_ms&lt;/code&gt;; reduce concurrency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WAF block loop&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;crawlberg_escalations_total&lt;/code&gt; elevated for one domain&lt;/td&gt;
&lt;td&gt;Inspect fingerprint corpus; tune &lt;code&gt;EscalationBudget&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OOM on large crawl&lt;/td&gt;
&lt;td&gt;RSS growing unboundedly&lt;/td&gt;
&lt;td&gt;Confirm you are using &lt;code&gt;crawl_stream&lt;/code&gt;, not &lt;code&gt;crawl&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSRF violations in CI&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SsrfPolicyViolation&lt;/code&gt; errors on test targets&lt;/td&gt;
&lt;td&gt;Add &lt;code&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK=1&lt;/code&gt; to CI environment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser pool exhausted&lt;/td&gt;
&lt;td&gt;Slow escalation, request queue buildup&lt;/td&gt;
&lt;td&gt;Increase pool size or reduce browser-tier concurrency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP HTTP endpoint not responding&lt;/td&gt;
&lt;td&gt;No tools returned from &lt;code&gt;/mcp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Verify binary was built with the &lt;code&gt;api&lt;/code&gt; and &lt;code&gt;mcp&lt;/code&gt; Cargo features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swift package not resolving&lt;/td&gt;
&lt;td&gt;SwiftPM &lt;code&gt;branch not found&lt;/code&gt; error&lt;/td&gt;
&lt;td&gt;Upgrade to 1.0.0; the &lt;code&gt;release/swift/1.0.0&lt;/code&gt; branch is now correctly created&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Security and Responsible Crawling
&lt;/h2&gt;

&lt;p&gt;Crawlberg's default-on SSRF defense closes the most common server-side request forgery vector for multi-tenant deployments, but responsible crawling requires attention beyond internal network safety:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;robots.txt&lt;/strong&gt;: The engine fetches and respects &lt;code&gt;robots.txt&lt;/code&gt; automatically. If a site specifies a &lt;code&gt;Crawl-delay&lt;/code&gt;, set &lt;code&gt;rate_limit_delay_ms&lt;/code&gt; to at least that value.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting&lt;/strong&gt;: As a general starting point (not a framework default), 1,000–2,000 ms between requests per domain avoids hammering most public sites. Aggressive crawling harms target infrastructure regardless of technical capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-Agent&lt;/strong&gt;: Set a descriptive &lt;code&gt;User-Agent&lt;/code&gt; that identifies your crawler and includes a contact URL or email address. This gives site operators a channel to reach you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terms of service&lt;/strong&gt;: SSRF defense and robots.txt compliance are technical safeguards, not legal authorization. Review the ToS of any site you crawl at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data retention&lt;/strong&gt;: Crawled content may contain personal data. Apply your jurisdiction's retention and deletion requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Contributor and Ecosystem Notes
&lt;/h2&gt;

&lt;p&gt;Crawlberg's 14 language bindings are generated by &lt;a href="https://github.com/xberg-io/alef" rel="noopener noreferrer"&gt;alef&lt;/a&gt; (pinned at 0.26.6 in this release). Contributing to a binding means editing alef templates, not the generated crates directly.&lt;/p&gt;

&lt;p&gt;Areas where the community can contribute:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WAF fingerprint corpus&lt;/strong&gt;: &lt;code&gt;rules/waf_fingerprints.toml&lt;/code&gt; benefits from real-world signal data. PRs adding fingerprints with test cases and site-class annotations are a concrete, low-barrier contribution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-language ergonomics&lt;/strong&gt;: The generated APIs are consistent but conservative. Per-language maintainers are welcome to propose binding-layer ergonomic improvements that stay within the generated API contract.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks&lt;/strong&gt;: Criterion benchmarks for the WAF subsystem ship in the repo. Throughput and latency benchmarks for the HTTP and streaming layers are an open gap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP integrations&lt;/strong&gt;: The MCP server opens Crawlberg to any agent framework that speaks Model Context Protocol. Reference integration guides for popular frameworks (LangChain, CrewAI, Pydantic AI) are high-value additions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs&lt;/strong&gt;: &lt;code&gt;docs.crawlberg.xberg.io&lt;/code&gt; is built from &lt;code&gt;docs/&lt;/code&gt;. Use-case guides — data pipelines, AI agent integrations, e-commerce monitoring, academic web archiving; are welcome.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Is this a drop-in replacement for &lt;code&gt;kreuzcrawl&lt;/code&gt;?&lt;/strong&gt;&lt;br&gt;
Functionally yes; the API shape is identical. The package names, env vars, C FFI symbols, and Go module path have changed. Follow the upgrade guide; the migration is mechanical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does the SSRF defense break internal test environments?&lt;/strong&gt;&lt;br&gt;
It will if your tests crawl &lt;code&gt;localhost&lt;/code&gt; or RFC 1918 addresses. Set &lt;code&gt;CRAWLBERG_ALLOW_PRIVATE_NETWORK=1&lt;/code&gt; in your test environment, or call &lt;code&gt;CrawlConfig::allow_private_networks(true)&lt;/code&gt; in test setup code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: When should I use &lt;code&gt;crawl_stream&lt;/code&gt; instead of &lt;code&gt;crawl&lt;/code&gt;?&lt;/strong&gt;&lt;br&gt;
Use &lt;code&gt;crawl_stream&lt;/code&gt; for any crawl larger than a few hundred pages. It bounds peak memory at ~20 MB regardless of corpus size. Use the batch &lt;code&gt;crawl()&lt;/code&gt; only when you need the full result set in memory at once — which is uncommon in production pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is the 1.0.0 API stable across all 14 bindings?&lt;/strong&gt;&lt;br&gt;
The Rust crate, CLI, and MCP tool definitions are stable. For Elixir specifically: the repository's Quick Start currently shows &lt;code&gt;{:crawlberg, "~&amp;gt; 0.3"}&lt;/code&gt; — verify the current published version at &lt;a href="https://hex.pm/packages/crawlberg" rel="noopener noreferrer"&gt;hex.pm/packages/crawlberg&lt;/a&gt; before pinning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does tiered dispatch slow down crawls that don't need browser rendering?&lt;/strong&gt;&lt;br&gt;
No. Domains that respond normally at the HTTP tier never escalate. EWMA per-domain state promotes well-behaved domains to start at the HTTP tier, so they avoid bypass and browser probes entirely after the initial warm-up within a session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn more
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Release&lt;/strong&gt;: &lt;a href="https://github.com/xberg-io/crawlberg/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;github.com/xberg-io/crawlberg/releases/tag/v1.0.0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repository&lt;/strong&gt;: &lt;a href="https://github.com/xberg-io/crawlberg" rel="noopener noreferrer"&gt;github.com/xberg-io/crawlberg&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://docs.crawlberg.xberg.io" rel="noopener noreferrer"&gt;docs.crawlberg.xberg.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Getting started&lt;/strong&gt;: &lt;code&gt;pip install crawlberg&lt;/code&gt; · &lt;code&gt;npm install @xberg-io/crawlberg&lt;/code&gt; · &lt;code&gt;cargo add crawlberg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord&lt;/strong&gt;: &lt;a href="https://discord.gg/xt9WY3GnKR" rel="noopener noreferrer"&gt;discord.gg/xt9WY3GnKR&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>rust</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
