<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: kartikay dubey</title>
    <description>The latest articles on DEV Community by kartikay dubey (@dubeykartikay).</description>
    <link>https://dev.to/dubeykartikay</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3818892%2Fe8587f68-3f37-4472-b7b0-89fae4ac0c9c.jpg</url>
      <title>DEV Community: kartikay dubey</title>
      <link>https://dev.to/dubeykartikay</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dubeykartikay"/>
    <language>en</language>
    <item>
      <title>How 3 Lines of Code Caused a 10x Kafka Throughput Drop</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:30 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/how-3-lines-of-code-caused-a-10x-kafka-throughput-drop-3ln5</link>
      <guid>https://dev.to/dubeykartikay/how-3-lines-of-code-caused-a-10x-kafka-throughput-drop-3ln5</guid>
      <description>&lt;p&gt;In August 2025, a user reported that Apache Kafka v3.9.0 dropped consumer throughput by 10x. Other users reproduced it. The culprit was a configuration called &lt;code&gt;min.insync.replicas&lt;/code&gt;, and the fix was three lines of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The report
&lt;/h2&gt;

&lt;p&gt;Sharad Garg opened a ticket titled "Consumer throughput drops by 10 times with Kafka v3.9.0 in ZK mode." Ritvik Gupta ran controlled tests and traced the issue to &lt;code&gt;min.insync.replicas&lt;/code&gt;. Setting it from 1 to 2 caused a massive drop:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Message Rate&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 Producer 1 Consumer&lt;/td&gt;
&lt;td&gt;89.21&lt;/td&gt;
&lt;td&gt;min.insync.replicas = 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1 Producer 1 Consumer&lt;/td&gt;
&lt;td&gt;298.99&lt;/td&gt;
&lt;td&gt;min.insync.replicas = 1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Another user reported throughput falling from 147 MB/s on Kafka 3.4 to 58 MB/s on Kafka 3.9 with the same setting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The root cause
&lt;/h2&gt;

&lt;p&gt;Chia-Ping Tsai, a long-time Kafka contributor, identified the issue. It traced back to KAFKA-15583, titled "High watermark can only advance if ISR size is larger than min ISR."&lt;/p&gt;

&lt;p&gt;The high watermark (HW) is the offset of the latest message copied to all in-sync replicas. Consumers are only allowed to read up to the HW. This guarantees that consumed data will not disappear if a broker crashes.&lt;/p&gt;

&lt;p&gt;The change added this check inside the leader's watermark advancement logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight scala"&gt;&lt;code&gt;&lt;span class="nf"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isUnderMinIsr&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="s"&gt;"Not increasing HWM because partition is under min ISR"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before v3.9.0, &lt;code&gt;min.insync.replicas&lt;/code&gt; only affected producers using &lt;code&gt;acks=all&lt;/code&gt;. It dictated how many replicas had to acknowledge a write before the producer considered it successful. It had nothing to do with consumers.&lt;/p&gt;

&lt;p&gt;After v3.9.0, the same setting also blocks consumer reads. If a follower is slow and drops out of the ISR, the leader stops advancing the high watermark until that follower catches up. Consumers stall until the watermark moves again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a feature, not a bug
&lt;/h2&gt;

&lt;p&gt;Kafka prioritizes durability over throughput. Blocking reads until &lt;code&gt;min.insync.replicas&lt;/code&gt; are healthy prevents consumers from reading data that has not been sufficiently replicated. If the leader crashes after a consumer reads an under-replicated message, that message is gone, and the consumer has already processed it.&lt;/p&gt;

&lt;p&gt;The trade-off is real. The change arguably deserved a major version bump, because a 10x throughput drop in a minor release can break production pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;If you hit this, your options are straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lower &lt;code&gt;min.insync.replicas&lt;/code&gt; if your durability requirements allow it.&lt;/li&gt;
&lt;li&gt;Ensure followers have enough resources to keep up with the leader.&lt;/li&gt;
&lt;li&gt;Monitor ISR size and follower lag as critical metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three lines of code. A massive performance impact. A reminder that distributed systems are full of sharp edges.&lt;/p&gt;

&lt;p&gt;For the full timeline, mailing list discussion, and the exact PR diff: &lt;a href="https://dubeykartikay.com/posts/kafka-throughput-drop-min-insync-replicas/" rel="noopener noreferrer"&gt;How a Minor Release Caused a 10x Throughput Drop in Kafka&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>distributedsystems</category>
      <category>performance</category>
      <category>apachekafka</category>
    </item>
    <item>
      <title>Optimizing My Hugo Blog: From 3.6 MB of JavaScript to Zero</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:28 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/optimizing-my-hugo-blog-from-36-mb-of-javascript-to-zero-22jh</link>
      <guid>https://dev.to/dubeykartikay/optimizing-my-hugo-blog-from-36-mb-of-javascript-to-zero-22jh</guid>
      <description>&lt;p&gt;My Hugo blog was downloading 3.6 MB of JavaScript and 40 KB of external CSS on every page load. For a static blog with mostly text and a few diagrams, that was absurd. Here is how I fixed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Baseline
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;HTML: 86 KB&lt;/li&gt;
&lt;li&gt;JavaScript: 3.6 MB (Mermaid + KaTeX)&lt;/li&gt;
&lt;li&gt;CSS: 40 KB (KaTeX stylesheets)&lt;/li&gt;
&lt;li&gt;Problem: render-blocking scripts loaded on every page for math and diagrams&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Optimization 1: HTML minification
&lt;/h2&gt;

&lt;p&gt;Adding &lt;code&gt;minifyOutput = true&lt;/code&gt; to &lt;code&gt;hugo.toml&lt;/code&gt; shrunk HTML by 16%. Small win, zero risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 2: Inline CSS
&lt;/h2&gt;

&lt;p&gt;I removed the external &lt;code&gt;main.css&lt;/code&gt; link and inlined the styles directly into the HTML. The HTML grew slightly, but I eliminated one render-blocking network request. First Contentful Paint improved because the browser no longer waits for a CSS fetch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 3: Native MathML
&lt;/h2&gt;

&lt;p&gt;My blog used KaTeX to render equations. That meant JavaScript, CSS, and font files for every page with math. I switched to Hugo's Goldmark passthrough extensions, which output native MathML. Browsers render this directly.&lt;/p&gt;

&lt;p&gt;Result: 278 KB of JavaScript removed, all external stylesheets eliminated. Math now renders without any scripts or fonts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 4: Conditional asset loading
&lt;/h2&gt;

&lt;p&gt;Mermaid.js was loading on every page, even text-only posts. I used Hugo's &lt;code&gt;.Store&lt;/code&gt; to set a &lt;code&gt;hasMermaid&lt;/code&gt; flag during Markdown processing. The script tag only injects when a page actually contains a diagram.&lt;/p&gt;

&lt;p&gt;Text-only pages no longer download Mermaid. Diagram pages still get it, but only when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 5: Server-side rendering for Mermaid
&lt;/h2&gt;

&lt;p&gt;Even conditional loading left a 3.3 MB script on diagram pages. I added a Node.js build step that pre-renders Mermaid blocks into static SVG files at build time. The frontend outputs &lt;code&gt;&amp;lt;img src="diagram.svg"&amp;gt;&lt;/code&gt; instead of a &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag.&lt;/p&gt;

&lt;p&gt;Result: zero JavaScript on the frontend. Total Blocking Time dropped because the browser no longer executes JS to calculate layouts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 6: Early Hints and caching
&lt;/h2&gt;

&lt;p&gt;I generated a &lt;code&gt;_headers&lt;/code&gt; file with strict &lt;code&gt;Cache-Control&lt;/code&gt; rules for immutable assets. The build script also injects &lt;code&gt;Link: rel=preload&lt;/code&gt; headers for images and SVGs. Cloudflare returns &lt;code&gt;103 Early Hints&lt;/code&gt;, telling the browser to fetch assets before the HTML document finishes downloading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JavaScript&lt;/td&gt;
&lt;td&gt;3.6 MB&lt;/td&gt;
&lt;td&gt;0 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External CSS&lt;/td&gt;
&lt;td&gt;40 KB&lt;/td&gt;
&lt;td&gt;0 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTML&lt;/td&gt;
&lt;td&gt;86 KB&lt;/td&gt;
&lt;td&gt;72 KB (minified)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The site is now 100% JavaScript-free on the frontend. Performance matters, and static sites do not need a heavy JS framework to be fast.&lt;/p&gt;

&lt;p&gt;For the full &lt;code&gt;hugo.toml&lt;/code&gt; config, build scripts, and Lighthouse score breakdown: &lt;a href="https://dubeykartikay.com/posts/hugo-optimization-zero-js/" rel="noopener noreferrer"&gt;Optimizing My Hugo Blog&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>hugo</category>
      <category>webperf</category>
      <category>staticsite</category>
      <category>zerojs</category>
    </item>
    <item>
      <title>Vector Databases and Semantic Search: A Practical Introduction</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:26 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/vector-databases-and-semantic-search-a-practical-introduction-414a</link>
      <guid>https://dev.to/dubeykartikay/vector-databases-and-semantic-search-a-practical-introduction-414a</guid>
      <description>&lt;p&gt;Traditional search engines match keywords. If you search for "dog shelters around Gurgaon" and the indexed page says "animal shelters near Delhi," you get no results. The words do not overlap.&lt;/p&gt;

&lt;p&gt;Semantic search fixes this by converting text into vectors. Similar ideas end up close together in vector space, even when the words differ.&lt;/p&gt;

&lt;h2&gt;
  
  
  From words to vectors
&lt;/h2&gt;

&lt;p&gt;An embedding model takes a word or sentence and produces a high-dimensional vector. The key property: semantically similar inputs produce vectors that are close to each other. "Dog" and "animal" sit near each other. "Dog" and "car" do not.&lt;/p&gt;

&lt;p&gt;For a search engine, the pipeline is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Convert every document in the corpus into a vector and store it.&lt;/li&gt;
&lt;li&gt;Convert the user's query into a vector using the same model.&lt;/li&gt;
&lt;li&gt;Find the documents whose vectors are closest to the query vector.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The hard part is step 3. A corpus of a million documents with 768-dimensional vectors is a massive dataset. Computing the exact distance from the query to every document is too slow for interactive search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approximate Nearest Neighbors
&lt;/h2&gt;

&lt;p&gt;Exact search is &lt;code&gt;O(n)&lt;/code&gt;. ANN algorithms trade a small amount of accuracy for massive speedups. The metric is &lt;code&gt;recall@k&lt;/code&gt;: out of the true k closest vectors, how many does the approximation find? A 95% recall@100 means 95 of the 100 true nearest neighbors are returned.&lt;/p&gt;

&lt;p&gt;Graph-based ANN builds a navigable graph over the dataset. Search starts at an entry point and greedily walks toward the query. Each step moves to the neighbor closest to the query, expanding the frontier until the best candidates are found.&lt;/p&gt;

&lt;h2&gt;
  
  
  DiskANN and Vamana
&lt;/h2&gt;

&lt;p&gt;Microsoft Research developed DiskANN and the Vamana index to make graph-based ANN work at scale. The algorithm has three pieces:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Greedy Search&lt;/strong&gt; maintains a candidate list and a visited set. It repeatedly expands the closest unvisited candidate, adds its graph neighbors, and keeps the best candidates bounded by a search-list size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Robust Prune&lt;/strong&gt; builds the graph edges. For each point, it considers possible neighbors and keeps a bounded set of useful outgoing edges. An &lt;code&gt;alpha&lt;/code&gt; parameter controls how aggressively candidates are pruned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vamana Construction&lt;/strong&gt; iterates over the dataset in random order. For each point, it runs greedy search, prunes the visited set into outgoing edges, adds backlinks, and repairs any degree violations.&lt;/p&gt;

&lt;p&gt;The result is a sparse graph where greedy search finds high-recall neighbors quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Vector databases like Pinecone, Weaviate, and Milvus package these ideas into production systems. They handle indexing, query routing, replication, and metadata filtering. If you are building semantic search, recommendation, or retrieval-augmented generation, you are probably using these algorithms whether you know it or not.&lt;/p&gt;

&lt;p&gt;For the full mathematical walkthrough with pseudocode, LaTeX equations, and diagrams: &lt;a href="https://dubeykartikay.com/posts/vector-databases-semantic-search/" rel="noopener noreferrer"&gt;How Google Search Actually Works&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>vectordb</category>
      <category>semanticsearch</category>
      <category>ann</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How I Made My Vector Search Engine 16x Faster Without Changing the Algorithm</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:24 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/how-i-made-my-vector-search-engine-16x-faster-without-changing-the-algorithm-4c8o</link>
      <guid>https://dev.to/dubeykartikay/how-i-made-my-vector-search-engine-16x-faster-without-changing-the-algorithm-4c8o</guid>
      <description>&lt;p&gt;I built a Vamana-based vector search engine in C++ called &lt;code&gt;sembed-engine&lt;/code&gt;. Recently I made a pull request that sped up queries by 16x and builds by 9x. The algorithm stayed exactly the same. The recall stayed at 1.0. The number of visited nodes did not change.&lt;/p&gt;

&lt;p&gt;The speedup came from data layout.&lt;/p&gt;

&lt;h2&gt;
  
  
  The old design
&lt;/h2&gt;

&lt;p&gt;The original code stored vectors as separate objects pointed to by &lt;code&gt;shared_ptr&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Record&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;shared_ptr&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Vector&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is clean C++. Every record has an id and a vector. The vector knows how to calculate distance. In the hot path, though, the CPU had to load the record, read the &lt;code&gt;shared_ptr&lt;/code&gt;, follow the pointer, call virtual methods, and read each float through an abstraction layer. Millions of times per query.&lt;/p&gt;

&lt;h2&gt;
  
  
  The new layout
&lt;/h2&gt;

&lt;p&gt;I replaced the object graph with a flat array. All vector values live in one contiguous block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;ids&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v0_dim0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v0_dim1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;v1_dim0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v1_dim1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vector &lt;code&gt;i&lt;/code&gt; starts at &lt;code&gt;values[i * D]&lt;/code&gt;. A &lt;code&gt;FloatVectorView&lt;/code&gt; is just a pointer and a dimension count. No allocations. No pointer chasing. The next vector is right after the previous one in memory.&lt;/p&gt;

&lt;p&gt;The assembly tells the story. The old code had virtual calls and scalar square roots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;call rax          ; virtual dispatch
sqrtss xmm2, xmm2 ; scalar square root
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new code loads packed floats and operates on four at a time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;movups xmm1, XMMWORD PTR [rdi+rax]
subps xmm1, xmm3
mulps xmm1, xmm1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Removing unnecessary square roots
&lt;/h2&gt;

&lt;p&gt;Euclidean distance includes a square root. For nearest-neighbor search, we only care about ordering, not the absolute distance value. If &lt;code&gt;sqrt(25) &amp;lt; sqrt(100)&lt;/code&gt;, then &lt;code&gt;25 &amp;lt; 100&lt;/code&gt;. The ordering is identical.&lt;/p&gt;

&lt;p&gt;Switching to squared distances eliminated &lt;code&gt;sqrtss&lt;/code&gt; entirely from the hot path. One caveat: Vamana pruning uses an &lt;code&gt;alpha&lt;/code&gt; parameter. When everything is squared, &lt;code&gt;alpha&lt;/code&gt; must be squared too to preserve the same comparison semantics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caching scores during sort
&lt;/h2&gt;

&lt;p&gt;The old comparator computed distances inside the sort function. Sorting calls the comparator many times, so the same distance was recomputed repeatedly. The fix was to compute each distance once, store it in a &lt;code&gt;ScoredNode { node; score; }&lt;/code&gt;, and sort by the cached score.&lt;/p&gt;

&lt;p&gt;Old comparator assembly called &lt;code&gt;new_view_squared&lt;/code&gt; repeatedly. New comparator assembly just loaded two floats and compared them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gvec query latency&lt;/td&gt;
&lt;td&gt;p50&lt;/td&gt;
&lt;td&gt;4.094 ms&lt;/td&gt;
&lt;td&gt;0.631 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;w2v query latency&lt;/td&gt;
&lt;td&gt;p50&lt;/td&gt;
&lt;td&gt;25.15 ms&lt;/td&gt;
&lt;td&gt;1.524 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;w2v build time&lt;/td&gt;
&lt;td&gt;total&lt;/td&gt;
&lt;td&gt;17.91 s&lt;/td&gt;
&lt;td&gt;1.889 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The search visited the same number of nodes. It stopped paying unnecessary tax at every node.&lt;/p&gt;

&lt;p&gt;For the full benchmark methodology, assembly breakdown, and PR diff: &lt;a href="https://dubeykartikay.com/posts/sembed-engine-vector-search-performance/" rel="noopener noreferrer"&gt;How I Made My Vector Search Engine 16x Faster&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>vectorsearch</category>
      <category>cpp</category>
      <category>performance</category>
      <category>vamana</category>
    </item>
    <item>
      <title>Setting Up Dual GPU Gaming Laptops in Hyprland</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:22 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/setting-up-dual-gpu-gaming-laptops-in-hyprland-3n9i</link>
      <guid>https://dev.to/dubeykartikay/setting-up-dual-gpu-gaming-laptops-in-hyprland-3n9i</guid>
      <description>&lt;p&gt;Gaming laptops with dual GPUs are common, and they are a pain on Linux. I run an ASUS Zephyrus G15 with an AMD integrated GPU and an NVIDIA discrete GPU. Before I fixed the setup, I dealt with broken resume from suspend, terrible battery life, overheating, and games that ran worse than they should.&lt;/p&gt;

&lt;p&gt;This is a practical guide for setting up dual GPU systems in Hyprland. Most of it applies to other Wayland compositors too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Set the iGPU as primary
&lt;/h2&gt;

&lt;p&gt;Hyprland uses the &lt;code&gt;AQ_DRM_DEVICES&lt;/code&gt; environment variable to decide which GPU drives the display. You want the iGPU first for power efficiency and better Linux compatibility.&lt;/p&gt;

&lt;p&gt;First, find your GPUs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lspci &lt;span class="nt"&gt;-d&lt;/span&gt; ::03xx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My output shows an RTX 3060 at &lt;code&gt;01:00.0&lt;/code&gt; and an AMD Vega at &lt;code&gt;06:00.0&lt;/code&gt;. Create udev rules to symlink these to friendly names:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/etc/udev/rules.d/igpu-device-path.rules&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;KERNEL&lt;/span&gt;==&lt;span class="s2"&gt;"card*"&lt;/span&gt;, &lt;span class="n"&gt;KERNELS&lt;/span&gt;==&lt;span class="s2"&gt;"0000:06:00.0"&lt;/span&gt;, &lt;span class="n"&gt;SUBSYSTEM&lt;/span&gt;==&lt;span class="s2"&gt;"drm"&lt;/span&gt;, &lt;span class="n"&gt;SUBSYSTEMS&lt;/span&gt;==&lt;span class="s2"&gt;"pci"&lt;/span&gt;, &lt;span class="n"&gt;SYMLINK&lt;/span&gt;+=&lt;span class="s2"&gt;"dri/amd-igpu"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;/etc/udev/rules.d/dgpu-device-path.rules&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;KERNEL&lt;/span&gt;==&lt;span class="s2"&gt;"card*"&lt;/span&gt;, &lt;span class="n"&gt;KERNELS&lt;/span&gt;==&lt;span class="s2"&gt;"0000:01:00.0"&lt;/span&gt;, &lt;span class="n"&gt;SUBSYSTEM&lt;/span&gt;==&lt;span class="s2"&gt;"drm"&lt;/span&gt;, &lt;span class="n"&gt;SUBSYSTEMS&lt;/span&gt;==&lt;span class="s2"&gt;"pci"&lt;/span&gt;, &lt;span class="n"&gt;SYMLINK&lt;/span&gt;+=&lt;span class="s2"&gt;"dri/nvidia-dgpu"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reload rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;udevadm control &lt;span class="nt"&gt;--reload-rules&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;udevadm trigger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then tell Hyprland to prefer the iGPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;env&lt;/span&gt; = &lt;span class="n"&gt;AQ_DRM_DEVICES&lt;/span&gt;, /&lt;span class="n"&gt;dev&lt;/span&gt;/&lt;span class="n"&gt;dri&lt;/span&gt;/&lt;span class="n"&gt;amd&lt;/span&gt;-&lt;span class="n"&gt;igpu&lt;/span&gt;:/&lt;span class="n"&gt;dev&lt;/span&gt;/&lt;span class="n"&gt;dri&lt;/span&gt;/&lt;span class="n"&gt;nvidia&lt;/span&gt;-&lt;span class="n"&gt;dgpu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Fix hardware video decoding
&lt;/h2&gt;

&lt;p&gt;Without hardware decoding, video playback burns CPU, drains battery, and stutters at high resolution. Check if your system already supports it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;pacman &lt;span class="nt"&gt;-S&lt;/span&gt; libva-utils
vainfo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;vainfo&lt;/code&gt; fails or picks the wrong GPU, set the driver explicitly. For AMD, add to &lt;code&gt;hyprland.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;env&lt;/span&gt; = &lt;span class="n"&gt;LIBVA_DRIVER_NAME&lt;/span&gt;, &lt;span class="n"&gt;radeonsi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common driver names: NVIDIA uses &lt;code&gt;nvidia&lt;/code&gt;, AMD uses &lt;code&gt;radeonsi&lt;/code&gt;, Intel uses &lt;code&gt;i965&lt;/code&gt; or &lt;code&gt;iHD&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Switch between Hybrid and Integrated mode
&lt;/h2&gt;

&lt;p&gt;For gaming, you want both GPUs active. For battery life, you want the dGPU completely off.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;pacman &lt;span class="nt"&gt;-S&lt;/span&gt; supergfxctl
supergfxctl &lt;span class="nt"&gt;-s&lt;/span&gt;    &lt;span class="c"&gt;# list supported modes&lt;/span&gt;
supergfxctl &lt;span class="nt"&gt;-g&lt;/span&gt;    &lt;span class="c"&gt;# check current mode&lt;/span&gt;
supergfxctl &lt;span class="nt"&gt;-m&lt;/span&gt; Integrated   &lt;span class="c"&gt;# iGPU only, saves battery&lt;/span&gt;
supergfxctl &lt;span class="nt"&gt;-m&lt;/span&gt; Hybrid       &lt;span class="c"&gt;# both GPUs, for gaming&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That covers the essentials. I wrote a longer post with full &lt;code&gt;hyprland.conf&lt;/code&gt; snippets, troubleshooting tips for NVIDIA-specific quirks, and screenshots of the setup: &lt;a href="https://dubeykartikay.com/posts/hyprland-dual-gpu-gaming-laptops/" rel="noopener noreferrer"&gt;How to Setup Dual GPU Systems in Hyprland&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>hyprland</category>
      <category>linux</category>
      <category>nvidia</category>
      <category>gaminglaptops</category>
    </item>
    <item>
      <title>How No Man's Sky Creates 18 Quintillion Planets With Just Math</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:20 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/how-no-mans-sky-creates-18-quintillion-planets-with-just-math-3fgf</link>
      <guid>https://dev.to/dubeykartikay/how-no-mans-sky-creates-18-quintillion-planets-with-just-math-3fgf</guid>
      <description>&lt;p&gt;No Man's Sky advertises 18 quintillion planets. That is not because someone modeled them by hand. It is because the game generates terrain, flora, and atmosphere from mathematical functions seeded by the planet's coordinates.&lt;/p&gt;

&lt;p&gt;The core idea is procedural generation, and the simplest building block is noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why raw randomness fails
&lt;/h2&gt;

&lt;p&gt;If you fill a height map with random numbers, you get chaos. Real terrain has smooth transitions: hills blend into valleys, coastlines curve gradually. The solution is a noise function that produces smooth, continuous random values.&lt;/p&gt;

&lt;p&gt;Perlin noise does exactly this. It generates values that vary gradually across space, so nearby points have similar heights. Feed a 2D grid of Perlin noise into a renderer, add color and lighting, and you get something that looks like terrain.&lt;/p&gt;

&lt;p&gt;The trick is layering. A single layer of Perlin noise looks too uniform, like rolling hills with no variation. Games stack multiple layers at different frequencies and amplitudes. Low-frequency layers define the broad shape of continents. High-frequency layers add rocks, cracks, and surface detail. This is called fractal Brownian motion, and it is the reason generated worlds look organic instead of synthetic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What No Man's Sky adds
&lt;/h2&gt;

&lt;p&gt;Sean Murray and the team at Hello Games went further than basic layered noise. Their GDC talk outlines several techniques:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain warping&lt;/strong&gt; twists the noise field itself. Instead of sampling noise at the raw coordinates, you sample at coordinates that have been displaced by another noise function. This creates caves, overhangs, and twisted terrain that straight noise cannot produce.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filtering and image processing&lt;/strong&gt; cleans up the raw noise. Unfiltered procedural terrain often looks muddy or repetitive. The team runs filters to emphasize ridges and valleys, suppress bland regions, and sculpt the terrain into more interesting shapes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DEM blending&lt;/strong&gt; mixes in real-world elevation data for grounding. The risk is making everything look like Earth, which is familiar but boring. The game uses this sparingly, blending real data with warped noise to keep things alien but plausible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Biome rules&lt;/strong&gt; layer on top of the terrain. Temperature, humidity, and elevation determine what plants and animals spawn. These rules are also procedural, driven by the same coordinate seeds that generated the planet itself. Visit the same planet twice, you get the same terrain and the same wildlife. Visit a different planet, everything changes.&lt;/p&gt;

&lt;p&gt;The result is a universe where every planet is deterministic (the same seed always produces the same world) but effectively infinite (the coordinate space is so large you will never see the same planet twice).&lt;/p&gt;

&lt;p&gt;If you want to see the Perlin noise graphs and a deeper walkthrough of the layering math: &lt;a href="https://dubeykartikay.com/posts/procedural-generation-no-mans-sky/" rel="noopener noreferrer"&gt;How No Man's Sky Creates 18 Quintillion Planets&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>gamedev</category>
      <category>proceduralgeneration</category>
      <category>perlinnoise</category>
      <category>nomansky</category>
    </item>
    <item>
      <title>Reading Algorithms Like an Engineer: What DiskANN Taught Me About Pseudocode</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:18 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/reading-algorithms-like-an-engineer-what-diskann-taught-me-about-pseudocode-2979</link>
      <guid>https://dev.to/dubeykartikay/reading-algorithms-like-an-engineer-what-diskann-taught-me-about-pseudocode-2979</guid>
      <description>&lt;p&gt;The first time I implemented Vamana from the DiskANN paper, my approximate nearest neighbor index was slower than brute force. On tiny test fixtures, brute force took 0.27 ms per query. My Vamana implementation took 22.98 ms.&lt;/p&gt;

&lt;p&gt;That sounds absurd. ANN exists to skip work. The problem was not the algorithm. It was how I mapped the paper's abstractions to actual data structures.&lt;/p&gt;

&lt;h2&gt;
  
  
  A set is not a data structure
&lt;/h2&gt;

&lt;p&gt;The DiskANN pseudocode talks about sets &lt;code&gt;L&lt;/code&gt;, &lt;code&gt;V&lt;/code&gt;, and &lt;code&gt;Nout(p)&lt;/code&gt;. That is fine for explanation. Code cannot store an abstract set.&lt;/p&gt;

&lt;p&gt;When the paper says &lt;code&gt;L&lt;/code&gt; (the candidate list), I had to decide: sorted vector? heap? bounded priority queue? How do I find the closest unvisited element? How do I enforce the search-list bound? How do I remove duplicates?&lt;/p&gt;

&lt;p&gt;When the paper says &lt;code&gt;V&lt;/code&gt; (the visited set), I had to decide: &lt;code&gt;unordered_set&lt;/code&gt;? dense bitset? boolean array? Node ids in my case were dense integers, so an indexed bit operation beat a hash-table lookup by a wide margin.&lt;/p&gt;

&lt;p&gt;When the paper says "remove candidates," I had to ask whether removal is physical or logical. In a hot loop, marking a candidate as deleted and skipping it is much cheaper than erasing from a vector and reshuffling everything behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;In my &lt;code&gt;sembed-engine&lt;/code&gt; project, I changed the implementation to match the invariants the algorithm already needed, rather than copying the pseudocode literally.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;Neighbour&lt;/code&gt; struct became &lt;code&gt;{ float distance; NodeId node; bool marked; }&lt;/code&gt;. A &lt;code&gt;SortedBoundedVector&lt;/code&gt; kept candidates sorted as they were inserted, capped the list size, rejected duplicates, and tracked the next unexpanded node. Visited tracking moved to &lt;code&gt;boost::dynamic_bitset&lt;/code&gt;. Pruning switched from physical deletion to marker-style bookkeeping.&lt;/p&gt;

&lt;p&gt;The algorithm did not change. The code started matching the invariants the algorithm already needed.&lt;/p&gt;

&lt;p&gt;After the fix, Vamana went from 22.98 ms to 0.02 ms on the same small fixture. On a larger dataset, it delivered 5.34x the query throughput of brute force while keeping recall at 1.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson
&lt;/h2&gt;

&lt;p&gt;Slow down at the nouns in pseudocode. If it says &lt;code&gt;L&lt;/code&gt;, ask what operations &lt;code&gt;L&lt;/code&gt; needs. If it says &lt;code&gt;V&lt;/code&gt;, ask how membership is checked. If it says "remove," ask whether deletion is physical or logical. If it says "bounded," ask where that bound is enforced.&lt;/p&gt;

&lt;p&gt;The paper gives the map. Implementation is the terrain.&lt;/p&gt;

&lt;p&gt;For the full benchmark data, PR details, and code snippets: &lt;a href="https://dubeykartikay.com/posts/reading-algorithms-like-an-engineer/" rel="noopener noreferrer"&gt;Reading Algorithms Like an Engineer&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>ann</category>
      <category>cpp</category>
      <category>diskann</category>
    </item>
    <item>
      <title>Why You Should Never Use std::unordered_set in Hot C++ Loops</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sun, 03 May 2026 16:06:17 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/why-you-should-never-use-stdunorderedset-in-hot-c-loops-2lc4</link>
      <guid>https://dev.to/dubeykartikay/why-you-should-never-use-stdunorderedset-in-hot-c-loops-2lc4</guid>
      <description>&lt;p&gt;Hash tables feel like the default choice for membership tests. &lt;code&gt;std::unordered_set&lt;/code&gt; promises average &lt;code&gt;O(1)&lt;/code&gt; lookup, so we reach for it automatically. In performance-sensitive C++ code, that habit can cost you an order of magnitude.&lt;/p&gt;

&lt;p&gt;I ran into this while building a Vamana graph index for approximate nearest neighbor search. The algorithm needs to track visited nodes. Node ids are dense integers, and the visited check runs inside the hottest loop in the entire search path.&lt;/p&gt;

&lt;p&gt;My first implementation used &lt;code&gt;std::unordered_set&amp;lt;uint32_t&amp;gt;&lt;/code&gt;. It was correct, and it was slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the benchmark says
&lt;/h2&gt;

&lt;p&gt;I generated 1000 vectors of random &lt;code&gt;uint32_t&lt;/code&gt; ids and deduplicated them using three approaches: &lt;code&gt;std::unordered_set&lt;/code&gt;, &lt;code&gt;sort + unique&lt;/code&gt;, and &lt;code&gt;boost::dynamic_bitset&amp;lt;&amp;gt;&lt;/code&gt;. For dense ids sampled from &lt;code&gt;[0, 2n)&lt;/code&gt;, the numbers were brutal:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;n&lt;/th&gt;
&lt;th&gt;unordered_set ms&lt;/th&gt;
&lt;th&gt;sort+unique ms&lt;/th&gt;
&lt;th&gt;boost bitset ms&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32,768&lt;/td&gt;
&lt;td&gt;1,649&lt;/td&gt;
&lt;td&gt;1,455&lt;/td&gt;
&lt;td&gt;177&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500,000&lt;/td&gt;
&lt;td&gt;50,302&lt;/td&gt;
&lt;td&gt;26,759&lt;/td&gt;
&lt;td&gt;3,423&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At &lt;code&gt;n = 500,000&lt;/code&gt;, the bitset was 14.7x faster. The hash table had to hash keys, grow buckets, rehash, and chase pointers through memory. The bitset did one indexed memory operation.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sort + unique&lt;/code&gt; also beat the hash table at scale because it walks contiguous memory, and CPUs love that.&lt;/p&gt;

&lt;h2&gt;
  
  
  When the hash table wins
&lt;/h2&gt;

&lt;p&gt;Sparse ids change the picture. When I sampled only &lt;code&gt;n&lt;/code&gt; ids from a universe of 100,000,000 possible values, the bitset had to clear a massive mostly-empty array before every vector:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;n&lt;/th&gt;
&lt;th&gt;unordered_set ms&lt;/th&gt;
&lt;th&gt;boost bitset ms&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;6.3&lt;/td&gt;
&lt;td&gt;149.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2,048&lt;/td&gt;
&lt;td&gt;91.9&lt;/td&gt;
&lt;td&gt;145.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;65,536&lt;/td&gt;
&lt;td&gt;4,169.3&lt;/td&gt;
&lt;td&gt;985.4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For small sparse inputs, &lt;code&gt;std::unordered_set&lt;/code&gt; is genuinely better. The bitset only pulls ahead once the input is large enough to amortize the fixed clearing cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical rule
&lt;/h2&gt;

&lt;p&gt;Reach for &lt;code&gt;std::unordered_set&lt;/code&gt; when ids are sparse, unbounded, or not integer-indexable. When ids are dense integers inside a hot loop, make the membership check an indexed load or store instead.&lt;/p&gt;

&lt;p&gt;The CPU does not care about your Big-O notation. It cares about memory access patterns.&lt;/p&gt;

&lt;p&gt;I wrote a longer post with the full methodology, assembly-level analysis, and raw CSV data: &lt;a href="https://dubeykartikay.com/posts/why-never-use-std-unordered-set/" rel="noopener noreferrer"&gt;Why You Should Never Use a set&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>performance</category>
      <category>algorithms</category>
      <category>benchmarking</category>
    </item>
    <item>
      <title>Optimize Hugo Blog Performance: Zero JS and 100% Lighthouse Score</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Sat, 04 Apr 2026 14:30:32 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/optimize-hugo-blog-performance-zero-js-and-100-lighthouse-score-4128</link>
      <guid>https://dev.to/dubeykartikay/optimize-hugo-blog-performance-zero-js-and-100-lighthouse-score-4128</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I reduced my Hugo blog's page weight by eliminating 3.6 MB of JavaScript and 40 KB of external CSS, achieving a 100% JS-free frontend. Key optimizations included HTML minification, inlining CSS, switching to native MathML, and pre-rendering Mermaid diagrams server-side.&lt;/p&gt;

&lt;p&gt;I recently looked into my blog's performance and was surprised to find my pages were downloading over 3.6 MB of JavaScript and render-blocking CSS on every load. For a simple static site, this was too much, so I decided to optimize it. &lt;/p&gt;

&lt;p&gt;Here is the step-by-step breakdown of how I reduced my payload and removed JavaScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Baseline
&lt;/h2&gt;

&lt;p&gt;Before starting, my site had some major issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTML Size:&lt;/strong&gt; 86,348 bytes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JS Size:&lt;/strong&gt; 3,617,515 bytes (3.6 MB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSS Size:&lt;/strong&gt; 40,560 bytes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Issue:&lt;/strong&gt; Massive blocking JS/CSS scripts were loaded on every single page for Mermaid diagrams and Math rendering. The HTML was also not minified.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxueyze3w5x0fyw4xvuuh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxueyze3w5x0fyw4xvuuh.png" alt="Hugo Blog Performance Baseline Metrics" width="800" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 1: HTML Minification
&lt;/h2&gt;

&lt;p&gt;The first step was simple: adding &lt;code&gt;minifyOutput = true&lt;/code&gt; to &lt;code&gt;hugo.toml&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTML Size:&lt;/strong&gt; 72,370 bytes (16% smaller)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; Reduced parsing time for HTML, leading to a faster First Paint.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8j7zizubjnffs9lg3loo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8j7zizubjnffs9lg3loo.png" alt="Hugo HTML Minification Results and Faster First Paint" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 2: Inlining CSS
&lt;/h2&gt;

&lt;p&gt;Next, I removed the &lt;code&gt;&amp;lt;link&amp;gt;&lt;/code&gt; tag pointing to my &lt;code&gt;main.css&lt;/code&gt; file and replaced it with an inline &lt;code&gt;&amp;lt;style&amp;gt;{{.Content|safeCSS}}&amp;lt;/style&amp;gt;&lt;/code&gt; block.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTML Size:&lt;/strong&gt; Increased to 127,350 bytes (because CSS is now inside the HTML document).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; This eliminated 1 critical render-blocking HTTP request. The browser no longer waits for an external CSS fetch, which improves &lt;strong&gt;First Contentful Paint (FCP)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpdsfhulc7wstyg965xd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frpdsfhulc7wstyg965xd.png" alt="Hugo Inline CSS Performance Impact and FCP Improvement" width="800" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 3: Native MathML
&lt;/h2&gt;

&lt;p&gt;My blog used the KaTeX library (JS, CSS, and fonts) to render equations. I removed it and enabled Hugo's Goldmark passthrough extensions to render Native MathML instead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTML Size:&lt;/strong&gt; 123,341 bytes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JS Size:&lt;/strong&gt; 3,338,725 bytes (278 KB smaller)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSS Size:&lt;/strong&gt; 0 bytes (Removed KaTeX CSS, meaning zero external stylesheets are loaded).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; A significant reduction in payload size. I removed the need for JavaScript and font files for math. The browser now renders it natively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmjsdg57d0e3suhzal9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmjsdg57d0e3suhzal9y.png" alt="Hugo Native MathML vs KaTeX JavaScript Performance" width="800" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 4: Conditional Asset Loading
&lt;/h2&gt;

&lt;p&gt;My Mermaid script was loading on every page. I used Hugo's &lt;code&gt;.Store&lt;/code&gt; to set a flag &lt;code&gt;hasMermaid&lt;/code&gt; when processing Markdown, and only injected the Mermaid &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tag if that flag is true.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTML Size:&lt;/strong&gt; 117,632 bytes (Saved 6 KB across all generated pages).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; Text-only blog posts no longer force the browser to download &lt;code&gt;mermaid.min.js&lt;/code&gt;. The JavaScript is only loaded when necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsojhfxtl4uqly41rsy3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsojhfxtl4uqly41rsy3u.png" alt="Hugo Conditional Asset Loading for Mermaid JS on Text Pages" width="800" height="161"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;(Text-only pages don't load Mermaid)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhf1c3k5tsyc3fws5uqe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhf1c3k5tsyc3fws5uqe.png" alt="Hugo Mermaid Diagram Rendering Output with Conditional Logic" width="800" height="151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Pages with diagrams load Mermaid conditionally)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 5: Server-side Rendering for Mermaid Diagrams
&lt;/h2&gt;

&lt;p&gt;Even conditionally, loading a 3.3 MB Mermaid script on some pages was heavy. I introduced a Node.js build step to pre-render Mermaid blocks into static SVG files. Now, the frontend outputs an &lt;code&gt;&amp;lt;img src="diagram.svg"&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JS Size:&lt;/strong&gt; 0 bytes (Removed the remaining 3.3 MB of Mermaid JavaScript).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; The site is now &lt;strong&gt;100% JavaScript-free&lt;/strong&gt; on the frontend. The &lt;code&gt;Total Blocking Time (TBT)&lt;/code&gt; metrics improved because the browser no longer executes JS to calculate layouts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faidqgyhivmjmimdfzx7e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faidqgyhivmjmimdfzx7e.png" alt="Hugo Server-side Rendering (SSR) for Mermaid Diagrams with Zero JS" width="800" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 6: Early Hints &amp;amp; Caching
&lt;/h2&gt;

&lt;p&gt;Finally, I optimized the network layer. I generated a &lt;code&gt;_headers&lt;/code&gt; file to define strict &lt;code&gt;Cache-Control&lt;/code&gt; rules for immutable assets. I also added &lt;code&gt;Link: &amp;lt;image&amp;gt;; rel=preload; as=image&lt;/code&gt; directives automatically via the build script.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; Cloudflare will now return &lt;code&gt;103 Early Hints&lt;/code&gt; responses, telling the browser to fetch SVGs and images immediately. Even before the HTML document finishes downloading. Assets cache indefinitely on repeat visits, eliminating secondary network fetch delays.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Summary
&lt;/h2&gt;

&lt;p&gt;Over the course of these 6 optimizations, I successfully brought the frontend static vendor sizes from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JS Payload:&lt;/strong&gt; 3.6 MB  -&amp;gt;  &lt;strong&gt;0 bytes&lt;/strong&gt; (100% reduction)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External CSS:&lt;/strong&gt; 40 KB -&amp;gt; &lt;strong&gt;0 bytes&lt;/strong&gt; (Eliminated all external style sheets, saving a round-trip on every page).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML Payload:&lt;/strong&gt; Minified by 16% initially, offset slightly by securely inlining CSS, ensuring near-instantaneous &lt;code&gt;First Contentful Paint&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Performance matters, and sometimes you don't need a heavy JS framework to deliver a fast experience!&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>performance</category>
    </item>
    <item>
      <title>Why Your Kafka Consumers Are Suddenly 10x Slower in v3.9.0</title>
      <dc:creator>kartikay dubey</dc:creator>
      <pubDate>Wed, 11 Mar 2026 19:06:29 +0000</pubDate>
      <link>https://dev.to/dubeykartikay/why-your-kafka-consumers-are-suddenly-10x-slower-in-v390-3f4d</link>
      <guid>https://dev.to/dubeykartikay/why-your-kafka-consumers-are-suddenly-10x-slower-in-v390-3f4d</guid>
      <description>&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;In a minor release of Apache Kafka, consumer throughput dropped by 10x.&lt;br&gt;&lt;br&gt;
This change was done to prioritize durability over throughput.&lt;br&gt;&lt;br&gt;
The main star of the story was the new &lt;code&gt;min.insync.replicas&lt;/code&gt; in the topic configuration.&lt;br&gt;&lt;br&gt;
In versions before v3.9.0, it controlled when the broker accepted writes from producers using &lt;code&gt;acks=all&lt;/code&gt;.&lt;br&gt;
Now, it can also dictate whether a message is able to be consumed by the consumers.&lt;br&gt;
This slight change caused a throughput drop of 10x for some Kafka users.  &lt;/p&gt;

&lt;p&gt;Keep reading to find out why this change was made, and how to fix it, so that your production systems don't throttle.  &lt;/p&gt;
&lt;h2&gt;
  
  
  It All Begins
&lt;/h2&gt;

&lt;p&gt;In August 2025, a user Sharad Garg raised an &lt;a href="https://issues.apache.org/jira/browse/KAFKA-19652" rel="noopener noreferrer"&gt;issue&lt;/a&gt; on the Kafka Issue tracker. &lt;br&gt;
It was titled "Consumer throughput drops by 10 times with Kafka v3.9.0 in ZK mode". &lt;br&gt;
Other people validated his claim, and shared more information on the reproduction steps.&lt;br&gt;&lt;br&gt;
Notably, Ritvik Gupta ran tests that pointed the blame on configuration &lt;code&gt;min.insync.replicas&lt;/code&gt;.&lt;br&gt;
In his tests he showed that the consumer throughput dropped significantly on changing &lt;code&gt;min.insync.replicas&lt;/code&gt; from &lt;code&gt;1&lt;/code&gt; to &lt;code&gt;2&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdx6suwbze3n19guu1fie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdx6suwbze3n19guu1fie.png" alt="Table showing throughput from when min.insync.replicas is changed" width="800" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other users came to this issue, reporting the same problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We were able to reproduce this issue in our own environment too.&lt;br&gt;
Throughput drops from: 147.5842 MB/sec (Kafka 3.4) to 58.6748 MB/sec (Kafka 3.9) with &lt;code&gt;min.insync.replicas=2&lt;/code&gt;"&lt;br&gt;&lt;br&gt;
-Bertalan Kondrat&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  It Gets Escalated
&lt;/h2&gt;

&lt;p&gt;Eventually the issue gets stale, getting no attention from the maintainers, then Marcus Page escalates it to the Kafka dev mailing list.&lt;br&gt;&lt;br&gt;
This is how I learn about this issue.&lt;br&gt;&lt;br&gt;
This escalation gets the attention of a long-time contributor to the project Chia-Ping Tsai.&lt;br&gt;
A day later, he replies to that email stating that he had identified the root cause, and posted it on the ticket. I obviously rushed to check this root cause, and it left me even more confused about this issue.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Root Cause
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The root cause is related to KAFKA-15583 - "High watermark can only advance if ISR size is larger than min ISR". The title says it all. The consumer can't read more data due to the HW, which can't be advanced due to the slow followers dropping the partition below the min ISR&lt;br&gt;&lt;br&gt;
-Chia-Ping Tsai&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  What Is High Watermark?
&lt;/h4&gt;

&lt;p&gt;The High Watermark is the offset of the latest message successfully copied to all brokers &lt;strong&gt;currently&lt;/strong&gt; in the In-Sync Replicas (ISR) list. It acts as a strict safety boundary so consumers only read fully committed data that won't disappear if a broker suddenly crashes.  &lt;/p&gt;

&lt;p&gt;Since I didn't know what High Watermark was I was confused, after learning about it, I was even more confused. Why would &lt;code&gt;min.insync.replicas&lt;/code&gt; dictate the HW?&lt;br&gt;&lt;br&gt;
If you don't know what &lt;code&gt;min.insync.replicas&lt;/code&gt; does, it allows you to enforce greater durability guarantees on the producer level. The producer raises an exception if a message is not acknowledged by &lt;code&gt;min.insync.replicas&lt;/code&gt; replicas after a write.&lt;br&gt;&lt;br&gt;
Notice how this description does not mention consumers.&lt;br&gt;&lt;br&gt;
Next, I looked into the related KAFKA-15583 issue.&lt;br&gt;&lt;br&gt;
It has no description. 🥲&lt;br&gt;&lt;br&gt;
It links to 2 PRs, and it is here that I finally understood the whole picture.&lt;/p&gt;
&lt;h2&gt;
  
  
  It's Not a Bug, It's a Feature
&lt;/h2&gt;

&lt;p&gt;After looking at one of the mentioned &lt;a href="https://github.com/apache/kafka/pull/14594" rel="noopener noreferrer"&gt;PR&lt;/a&gt;, I found the 3 lines of code that caused consumer throughput to drop by a factor of 10.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;  private def maybeIncrementLeaderHW(leaderLog: UnifiedLog, currentTimeMs: Long = time.milliseconds): Boolean = {
&lt;span class="gi"&gt;+    if (isUnderMinIsr) {
+      trace(s"Not increasing HWM because partition is under min ISR(ISR=${partitionState.isr})")
+      return false
+    }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These 3 lines of code effectively &lt;strong&gt;block&lt;/strong&gt; consumer reads until a produced message is replicated in &lt;strong&gt;at least&lt;/strong&gt; &lt;code&gt;min.insync.replicas&lt;/code&gt;.&lt;br&gt;
So if one of the followers is latent, and gets kicked out of in-sync replicas, the leader has to wait for it to catch up before allowing the next read to a consumer.&lt;br&gt;
This effectively is a trade-off between reliability and performance. &lt;br&gt;
So, even though it affects consumer throughput, Kafka accepts this performance loss in favor of being highly reliable and ensuring no data loss.  &lt;/p&gt;

&lt;p&gt;In my opinion, this change maybe needed a major version bump instead of a minor, as this change in throughput could cause a lot of production systems to be impacted.&lt;br&gt;
But bumping versions is already a controversial topic I'll talk about another day&lt;/p&gt;

&lt;p&gt;So that's it, a 3 line change that causes a massive throughput drop.&lt;br&gt;&lt;br&gt;
I half expected it to be an obscure JVM bug, or a CPU architecture issue, but with 99.99999999% of other bugs, it was due to new code.&lt;/p&gt;

&lt;p&gt;Thanks for reading through the end. I have joined the kafka-dev mailing list, and actively trying to become a contributor.&lt;br&gt;&lt;br&gt;
Follow this blog for more quirks and insider information about Kafka.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
