<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: EgorBabsuhkin</title>
    <description>The latest articles on DEV Community by EgorBabsuhkin (@babasha).</description>
    <link>https://dev.to/babasha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3650962%2F58e3f9ff-8911-469e-a7d4-0ced213937a0.jpeg</url>
      <title>DEV Community: EgorBabsuhkin</title>
      <link>https://dev.to/babasha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/babasha"/>
    <language>en</language>
    <item>
      <title>SSR on Rust: From Experiments to 95,000 RPS</title>
      <dc:creator>EgorBabsuhkin</dc:creator>
      <pubDate>Mon, 08 Dec 2025 02:29:18 +0000</pubDate>
      <link>https://dev.to/babasha/ssr-on-rust-from-experiments-to-95000-rps-53pi</link>
      <guid>https://dev.to/babasha/ssr-on-rust-from-experiments-to-95000-rps-53pi</guid>
      <description>&lt;p&gt;Introduction&lt;br&gt;
It all started one evening... I was tinkering with rewriting the front-end of a marketplace from React to Preact, using Brotli compression and native CSS, just to test out some extreme optimizations. In my quest for maximum performance and speed, I decided to experiment with porting the back-end to Rust, including compressing the database into Redis—but that's a story for another time. Anyway, these experiments led me to the idea of building an SSR engine on Rust, and benchmarks showed me hitting 95,000+ RPS on an M4 chip. That's pretty decent in itself; I'll dive into the details below.&lt;br&gt;
Architecture of Rusty-SSR&lt;br&gt;
Rust gives you more flexibility in managing threads and memory. At the core of Rusty-SSR is a pool of V8 isolates, thread pinning to cores, and a multi-tier caching system.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;V8 Isolates Pool for Multithreading
Instead of separate OS processes, we use lightweight V8 isolates within a single Rust process, one per thread.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;let pool = V8Pool::new(V8PoolConfig {
    num_threads: num_cpus::get(), // Use all cores
    queue_capacity: 512,          // Queue for backpressure
    ..Default::default()
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This avoids blocking: if one isolate is busy, others keep handling requests.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Thread Pinning to Cores
Context switches can kill performance. To minimize them, each thread is pinned to a specific CPU core.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if let Some(core_id) = cores.get(idx) {
    if core_affinity::set_for_current(*core_id) {
        tracing::debug!("Worker {} pinned to core {:?}", id, core_id.id);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This keeps the processor cache (L1/L2) hot. In the cloud, results may vary, so profiling is recommended.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Multi-Tier Caching
Caching reduces rendering needs. Instead of a simple HashMap with locks, it's a two-level setup:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hot Cache (L1): Thread-local for instant access without synchronization.&lt;br&gt;
Cold Cache (L2): DashMap for shared access across threads.&lt;/p&gt;

&lt;p&gt;Cache size is set in elements (pages), TTL in seconds (e.g., cache_ttl_secs(300)). Metrics are available via engine.cache_metrics() (hit-rate, hot/cold hits, etc.).&lt;br&gt;
Data Prefetching&lt;br&gt;
For speedup, SSE instructions preload data into the CPU cache—like warming up your coffee in advance so you don't wait.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fn prefetch_data(data: &amp;amp;str) {
    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    unsafe {
        use core::arch::x86_64::{_mm_prefetch, _MM_HINT_T0};
        _mm_prefetch(data.as_ptr() as *const i8, _MM_HINT_T0);
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Internal Structure of Hot Cache&lt;br&gt;
The Hot Cache is split into an ultra-hot array (8 elements for super-fast access) and a HashMap (128 elements). Entries are promoted via LRU.&lt;br&gt;
Rust#[repr(align(64))]  // Align to cache line&lt;br&gt;
pub struct HotCache {&lt;br&gt;
    ultra_hot: [Option; 8],&lt;br&gt;
    hot_map: HashMap,&lt;br&gt;
    // ...&lt;br&gt;
}&lt;br&gt;
Zero-Copy with Arc&lt;br&gt;
HTML is stored as Arc to avoid copying between threads.&lt;br&gt;
Rustlet html: Arc = Arc::from(rendered_html.as_str());&lt;br&gt;
cache.insert(url, Arc::clone(&amp;amp;html));  // Only the Arc is cloned&lt;br&gt;
This saves memory for large pages.&lt;br&gt;
DashMap Optimization&lt;br&gt;
The Cold Cache uses DashMap with 128 shards to reduce contention in multithreading. Testing showed a +19% throughput boost over the default 16 shards. Here's a breakdown of the results:&lt;/p&gt;

&lt;p&gt;16 shards (default): 51M elem/s (baseline)&lt;br&gt;
32 shards: 57M elem/s (+12%)&lt;br&gt;
64 shards: 59M elem/s (+16%)&lt;br&gt;
128 shards: 60.6M elem/s (+19%)&lt;br&gt;
256 shards: 60.3M elem/s (+18%)&lt;/p&gt;

&lt;p&gt;Reliability and Production Readiness&lt;/p&gt;

&lt;p&gt;Queue with timeout (request_timeout) prevents deadlocks.&lt;br&gt;
Error handling for bundle loading.&lt;br&gt;
Full cache clearing, including thread-local.&lt;/p&gt;

&lt;p&gt;Benchmarks&lt;br&gt;
Tests on Apple M4 (10 cores) using wrk --latency -t10 -c400/1000 -d30s on loopback, demo HTML from the repo, warmed cache. Key metrics:&lt;/p&gt;

&lt;p&gt;Throughput: 95,363 req/s (High throughput)&lt;br&gt;
Latency p50: 0.46 ms (Median latency)&lt;br&gt;
Latency p99: 4.60 ms (Under load)&lt;/p&gt;

&lt;p&gt;I'm currently using this setup for my portfolio at &lt;a href="https://portfolio-production-b677.up.railway.app/" rel="noopener noreferrer"&gt;https://portfolio-production-b677.up.railway.app/&lt;/a&gt;. It's still rough around the edges and mostly desktop-oriented, but it serves as a benchmark too—with complex content like animations and Three.js, yet loading is lightning-fast. The portfolio runs on the cheapest Redis plan.&lt;br&gt;
In real-world scenarios, performance depends on network, databases, and browsers. But even modest improvements can cut infrastructure costs, which is good for the environment at least :)&lt;br&gt;
Conclusion&lt;br&gt;
Rust provides tools for building efficient web servers. This is my experience, which might be useful to others. The code is open under MIT. If you try it out, share your thoughts in the comments—I'd love to hear feedback.&lt;br&gt;
Links&lt;/p&gt;

&lt;p&gt;GitHub Repository &lt;a href="https://github.com/babasha/Rusty-SSR" rel="noopener noreferrer"&gt;https://github.com/babasha/Rusty-SSR&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>rust</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
