<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marco Mengelkoch</title>
    <description>The latest articles on DEV Community by Marco Mengelkoch (@marcomq).</description>
    <link>https://dev.to/marcomq</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872755%2F9fd90f0a-9ffb-4f51-8251-e03be75a6b50.png</url>
      <title>DEV Community: Marco Mengelkoch</title>
      <link>https://dev.to/marcomq</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/marcomq"/>
    <language>en</language>
    <item>
      <title>I Hit a 400k/s Wall — So I Built a Faster UUID v7 Generator in Rust</title>
      <dc:creator>Marco Mengelkoch</dc:creator>
      <pubDate>Wed, 15 Apr 2026 06:12:31 +0000</pubDate>
      <link>https://dev.to/marcomq/i-hit-a-400ks-wall-so-i-built-a-faster-uuid-v7-generator-in-rust-8ok</link>
      <guid>https://dev.to/marcomq/i-hit-a-400ks-wall-so-i-built-a-faster-uuid-v7-generator-in-rust-8ok</guid>
      <description>&lt;p&gt;I was stress-testing a message pipeline. Thousands of messages flying through queues, each needing a unique ID. The code looked fine. The network looked fine. But throughput kept hitting a ceiling around &lt;strong&gt;400,000 messages/second&lt;/strong&gt; — and refused to go higher.&lt;/p&gt;

&lt;p&gt;After some profiling, I found the culprit: &lt;code&gt;uuid::Uuid::now_v7()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Not the queue. Not the serializer. The ID generator.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why UUID v7?
&lt;/h2&gt;

&lt;p&gt;UUID v7 is a relatively recent RFC (finalized in 2024). Unlike v4 (pure random), v7 embeds a &lt;strong&gt;millisecond-precision Unix timestamp&lt;/strong&gt; in the top 48 bits. That makes them naturally sortable — great for database primary keys, message IDs, log correlation, anywhere you want "roughly time-ordered" uniqueness without a central counter.&lt;/p&gt;

&lt;p&gt;The layout looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|← 48 bits (ms timestamp) →|← 4 bits ver →|← 12 bits rand →|← 2 bits var →|← 62 bits rand →|
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: generating them correctly requires randomness. And the &lt;code&gt;uuid&lt;/code&gt; crate, by default, uses a &lt;strong&gt;cryptographically secure&lt;/strong&gt; RNG for those 74 random bits. That's &lt;code&gt;OsRng&lt;/code&gt; — which means a syscall on every generation. Safe, correct, and very slow for high-throughput use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottleneck
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This was killing my throughput&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Uuid&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;now_v7&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// ~1400ns per call on macOS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;~1.4 microseconds. Doesn't sound like much. But at scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,400 ns × 1,000,000 = &lt;strong&gt;1.4 seconds&lt;/strong&gt; just generating IDs for 1M messages&lt;/li&gt;
&lt;li&gt;That's the 400k/s ceiling right there&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This is primarily a &lt;strong&gt;macOS problem&lt;/strong&gt;. On Linux and Windows, the &lt;code&gt;uuid&lt;/code&gt; crate's default &lt;code&gt;OsRng&lt;/code&gt; is significantly cheaper — the gap there is roughly &lt;strong&gt;10× slower&lt;/strong&gt; than &lt;code&gt;fast-uuid-v7&lt;/code&gt;, not 165×. If you're on Linux in production, this may be less critical. But if you develop on a Mac and deploy to Linux, it's still worth knowing about.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;code&gt;uuid&lt;/code&gt; crate does have a &lt;code&gt;fast-rng&lt;/code&gt; feature flag that switches to a faster RNG. I only found out about it &lt;em&gt;after&lt;/em&gt; building my own solution. It does help — but I'd advise against using it casually: enabling &lt;code&gt;fast-rng&lt;/code&gt; is a &lt;strong&gt;global flag&lt;/strong&gt; that affects all UUID generation in your binary, including v4 UUIDs you might be generating elsewhere for security-sensitive purposes (session tokens, CSRF tokens, etc.). Swapping out cryptographic randomness application-wide is a non-obvious footgun. &lt;code&gt;fast-uuid-v7&lt;/code&gt; keeps the fast path opt-in and explicit.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/marcomq/fast-uuid-v7" rel="noopener noreferrer"&gt;&lt;code&gt;fast-uuid-v7&lt;/code&gt;&lt;/a&gt; — a focused Rust library that generates UUID v7 compatible IDs as fast as possible, without breaking the spec.&lt;/p&gt;

&lt;p&gt;The core ideas:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Thread-Local State (No Locks)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;thread_local!&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;RNG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RefCell&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;SmallRng&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;RefCell&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;SmallRng&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from_entropy&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each thread gets its own RNG and counter. No mutexes, no atomics, no contention. This alone removes a significant source of overhead in multi-threaded workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Amortized Timestamps
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;SystemTime::now()&lt;/code&gt; costs ~20–40 ns. Calling it for every ID is wasteful. Instead, we use &lt;strong&gt;CPU cycle counters&lt;/strong&gt; (&lt;code&gt;rdtsc&lt;/code&gt; on x86_64, &lt;code&gt;cntvct_el0&lt;/code&gt; on ARM) to cheaply detect whether a millisecond has elapsed. We only call the real system clock when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. SmallRng
&lt;/h3&gt;

&lt;p&gt;We swap out &lt;code&gt;OsRng&lt;/code&gt; for &lt;code&gt;SmallRng&lt;/code&gt; from the &lt;code&gt;rand&lt;/code&gt; crate — a fast, non-cryptographic PRNG. It's seeded once per thread from entropy. Not suitable for cryptography, but perfectly fine for database keys.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Stack-Allocated String Formatting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Zero allocation — returns a stack-allocated FixedString&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;gen_id_str&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// ~21–60 ns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The canonical UUID string representation (&lt;code&gt;xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&lt;/code&gt;) is 36 bytes. We write it directly into a stack buffer and return a type that implements &lt;code&gt;Deref&amp;lt;Target=str&amp;gt;&lt;/code&gt;. No heap allocation at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;fast-uuid-v7&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;fast_uuid_v7&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;gen_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gen_id_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gen_id_string&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// u128 — fastest, 74 bits random (~8–50 ns)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u128&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;gen_id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Stack string — zero allocation (~21–60 ns)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;gen_id_str&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// e.g. "01942f3a-bc12-7d4e-8f01-2b3c4d5e6f70"&lt;/span&gt;

&lt;span class="c1"&gt;// Heap String — for when you need an owned String (~85–130 ns)&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;gen_id_string&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Format an u128 as &amp;amp;str on the stack&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_uuid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;gen_id&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;On Apple M1 / recent x86_64:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;uuid::Uuid::now_v7()&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;~1400 ns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;uuid::Uuid::now_v7()&lt;/code&gt; (fast-rng feature)&lt;/td&gt;
&lt;td&gt;~90 ns (u128), ~170 ns (string)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fast_uuid_v7::gen_id()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~8–50 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fast_uuid_v7::gen_id_str()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~21–60 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fast_uuid_v7::gen_id_string()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~85–130 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's up to &lt;strong&gt;165× faster&lt;/strong&gt; than the default &lt;code&gt;uuid&lt;/code&gt; crate, and still &lt;strong&gt;8–10× faster&lt;/strong&gt; than &lt;code&gt;uuid&lt;/code&gt; with &lt;code&gt;fast-rng&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Generating 10 million IDs takes roughly &lt;strong&gt;95 ms&lt;/strong&gt; on a single core.&lt;/p&gt;

&lt;p&gt;To benchmark the repository code on your own machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo bench
&lt;span class="c"&gt;# or&lt;/span&gt;
cargo &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--release&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; test_next_id_performance &lt;span class="nt"&gt;--nocapture&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Caveats
&lt;/h2&gt;

&lt;p&gt;To be upfront about the limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not cryptographically secure.&lt;/strong&gt; Do not use this for session tokens, secrets, or anything security-sensitive. Use the &lt;code&gt;uuid&lt;/code&gt; crate with &lt;code&gt;OsRng&lt;/code&gt; for those.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clock drift edge cases.&lt;/strong&gt; The batched timestamp check assumes CPU counter frequency stability. VM migrations or unusual scheduling could cause a ~1ms timestamp lag. This will not happen on high throughput and I also couldn't replicate it, but this may be an issue in theory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;SystemTime::now()&lt;/code&gt; is still called periodically.&lt;/strong&gt; The ~8 ns figure is the amortized hot-path cost. When a millisecond boundary is detected, the call drops to ~50 ns — still much faster than the baseline.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When Should You Use This?
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;fast-uuid-v7&lt;/code&gt; if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're generating IDs at very high rates (hundreds of thousands per second)&lt;/li&gt;
&lt;li&gt;You need time-sortable IDs (database PKs, log IDs, message IDs)&lt;/li&gt;
&lt;li&gt;Security/unpredictability of the random component is &lt;strong&gt;not&lt;/strong&gt; a requirement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stick with &lt;code&gt;uuid&lt;/code&gt; if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need cryptographically secure randomness&lt;/li&gt;
&lt;li&gt;Generation rate is not a bottleneck&lt;/li&gt;
&lt;li&gt;You want RFC-strict compliance guarantees&lt;/li&gt;
&lt;li&gt;You need parsing (!) - fast-uuid-v7 doesn't do any uuid parsing.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  AI Disclaimer
&lt;/h2&gt;

&lt;p&gt;This text here was mostly written with AI assistance. I am not a native speaker and an LLM can use much better words than I ever could. Also, some of the ideas behind fast-uuid-v7 were researched with AI assistance. The original idea, the actual optimization, benchmarks and everything else was my input.&lt;/p&gt;




&lt;h2&gt;
  
  
  Repository
&lt;/h2&gt;

&lt;p&gt;The crate is MIT licensed and available on &lt;a href="https://crates.io/crates/fast-uuid-v7" rel="noopener noreferrer"&gt;crates.io&lt;/a&gt; and &lt;a href="https://github.com/marcomq/fast-uuid-v7" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Feedback, benchmarks on your hardware, and PRs are welcome.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>showdev</category>
      <category>performance</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
