<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Wences Martinez</title>
    <description>The latest articles on DEV Community by Wences Martinez (@wenchodev).</description>
    <link>https://dev.to/wenchodev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2480192%2F435df44d-b5fe-420c-b95d-38181fc0e28f.jpg</url>
      <title>DEV Community: Wences Martinez</title>
      <link>https://dev.to/wenchodev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/wenchodev"/>
    <language>en</language>
    <item>
      <title>Building a production-ready RAG pipeline</title>
      <dc:creator>Wences Martinez</dc:creator>
      <pubDate>Wed, 13 May 2026 08:51:51 +0000</pubDate>
      <link>https://dev.to/wenchodev/building-a-production-ready-rag-pipeline-2jk6</link>
      <guid>https://dev.to/wenchodev/building-a-production-ready-rag-pipeline-2jk6</guid>
      <description>&lt;p&gt;&lt;strong&gt;L&lt;/strong&gt;arge &lt;strong&gt;L&lt;/strong&gt;anguage &lt;strong&gt;M&lt;/strong&gt;odels (aka &lt;em&gt;LLMs&lt;/em&gt;) have a memory problem: their knowledge stops the day their training data was cut off, they don't know your codebase, they don't know last week's tickets…&lt;/p&gt;

&lt;p&gt;When they're missing context they don't say so… they guess, confidently. The polite term is &lt;em&gt;hallucination&lt;/em&gt;; the less polite one is &lt;em&gt;lying with style&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuofgydhm3z3ggylx26ms.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuofgydhm3z3ggylx26ms.gif" alt="LLMs sometimes hallucinate" width="320" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;R&lt;/strong&gt;etrieval-&lt;strong&gt;A&lt;/strong&gt;ugmented &lt;strong&gt;G&lt;/strong&gt;eneration (aka &lt;strong&gt;RAG&lt;/strong&gt;) is how you fix that without retraining anything.&lt;/p&gt;

&lt;p&gt;Think of it as turning a closed-book exam into an open-book one. The LLM is still the writer, but now it has a librarian: a system that fetches the right passages from your data and hands them over before the model puts pen to paper.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; to learn this end-to-end.&lt;/p&gt;

&lt;p&gt;Keystone does two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingest a GitHub repository's activity → every PR, commit, issue, and discussion&lt;/li&gt;
&lt;li&gt;Answer questions about &lt;strong&gt;why the codebase looks the way it does&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first prototype didn't have a retrieval system, it had a giant string.&lt;/p&gt;

&lt;p&gt;That worked for tiny repos. On a real one (&lt;em&gt;+1000 commits&lt;/em&gt;, &lt;em&gt;+500 merged PRs&lt;/em&gt;, tons of issues, plus a tree of &lt;em&gt;~1,200 files&lt;/em&gt;) it broke in four ways at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The prompt blew past the &lt;strong&gt;context window&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The model &lt;strong&gt;lost the thread&lt;/strong&gt; halfway through.&lt;/li&gt;
&lt;li&gt;The latency hit double digits of seconds, and &lt;strong&gt;every answer cost ~$0.15 in tokens&lt;/strong&gt; for a query that should cost a fraction of a cent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the moment &lt;strong&gt;RAG stops being optional but required&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Below is what I actually built, lifted from the codebase running today 👇🏻&lt;/p&gt;




&lt;h2&gt;
  
  
  What RAG actually is (and what it isn't)
&lt;/h2&gt;

&lt;p&gt;The clean mental model: RAG is just "&lt;em&gt;search, then prompt.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;You convert your data into a &lt;strong&gt;search index ahead of time&lt;/strong&gt;. At query time, you look up the most relevant pieces and paste only those into the prompt. That's it!&lt;/p&gt;

&lt;p&gt;We can define two main stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: search your data and pull the chunks most relevant to the user's question.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt;: send those chunks plus the question to a regular LLM call and let it write the answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else; &lt;em&gt;embeddings&lt;/em&gt;, &lt;em&gt;vector databases&lt;/em&gt;, &lt;em&gt;re-ranking&lt;/em&gt;, &lt;em&gt;hybrid search&lt;/em&gt;… it exists because &lt;strong&gt;matching meaning&lt;/strong&gt; is harder than &lt;strong&gt;matching keywords&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to use RAG
&lt;/h2&gt;

&lt;p&gt;The answer depends on facts the model doesn't have.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Private docs, your codebase, last week's tickets, anything post-cutoff or non-public.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's also how you get grounded answers with citations, the model can point at exactly which chunk of which document it used, which is the &lt;strong&gt;difference between a tool you can ship to users and a demo you can't&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to use RAG
&lt;/h2&gt;

&lt;p&gt;You want the model to &lt;strong&gt;behave differently&lt;/strong&gt; (tone, format, reasoning style). That's a f*ine-tuning* or &lt;em&gt;prompting problem&lt;/em&gt;, not a &lt;strong&gt;retrieval&lt;/strong&gt; problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG injects knowledge, it does not change behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second &lt;strong&gt;mistake&lt;/strong&gt; I see: people reach for RAG when the data is small enough to fit in a prompt. If you have 10,000 tokens of context, just paste it. RAG buys you scale at the cost of an extra layer that will leak relevance bugs into your product.&lt;/p&gt;




&lt;h2&gt;
  
  
  The four stages every RAG has
&lt;/h2&gt;

&lt;p&gt;Every &lt;strong&gt;RAG in production&lt;/strong&gt; has the same four stages, and each one breaks in its own special way if you do it naively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion&lt;/strong&gt;: pull data from somewhere and split it into chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt;: turn each chunk into a vector so similarity becomes math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: search for the chunks closest to the user's question.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesis&lt;/strong&gt;: hand those chunks to an LLM and let it write the answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl0x13asg4o1mlm0fal3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl0x13asg4o1mlm0fal3.png" alt="The four stages of a solid RAG pipeline" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next sections go through each one in order, with what I do in &lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; and what I'd warn you about.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Chunking: where most RAG systems fail
&lt;/h2&gt;

&lt;p&gt;This is the section nobody wants to read because chunking sounds &lt;em&gt;boring&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It's also the section that decides &lt;strong&gt;whether your RAG actually works&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The naive approach is "&lt;em&gt;split text every 500 tokens&lt;/em&gt;." That dies for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A PR, for example, is not just 500 tokens of one thing. It's a &lt;strong&gt;title&lt;/strong&gt;, a &lt;strong&gt;body&lt;/strong&gt;, a &lt;strong&gt;list of files&lt;/strong&gt;, a list of &lt;strong&gt;commit messages&lt;/strong&gt;, sometimes &lt;strong&gt;comments&lt;/strong&gt;, &lt;strong&gt;discussions&lt;/strong&gt;. Embedding them as one blob averages five different topics into one vector. Retrieval returns the wrong PR because the &lt;strong&gt;vector is an average of irrelevant stuff.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not all artifacts are equal&lt;/strong&gt;. A merged PR with five reviews carries more architectural signal than a &lt;em&gt;Fix typo&lt;/em&gt; commit. Treating them with the same chunk size and same metadata throws away the &lt;strong&gt;asymmetry&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I use &lt;strong&gt;&lt;em&gt;typed chunking&lt;/em&gt;&lt;/strong&gt;; different artifact types get different &lt;em&gt;chunkers&lt;/em&gt;, different size &lt;em&gt;budgets&lt;/em&gt;, and different &lt;em&gt;metadata&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chunkPR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;IngestPullRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;EmbeddingChunk&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filesStr&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commitsStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commentsStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | Comments: &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reviewsStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | Reviews: &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[PR #&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | Files: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;filesStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | Commits: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;commitsStr&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;commentsStr&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;reviewsStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`pr:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// &amp;lt;- stable, dedupable&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nf"&gt;truncate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;       &lt;span class="c1"&gt;// &amp;lt;- PRs get 4000 chars&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pr&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;merged_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;merged_at&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's what a &lt;strong&gt;real chunk looks like&lt;/strong&gt; coming out of that function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sourceId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pr:42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[PR #42] Replace REST with GraphQL for the data layer: Switched from ..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wencesms92"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"merged_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-11-14T10:22:00Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Issues get their own &lt;em&gt;chunker&lt;/em&gt; with the same shape but a smaller budget (&lt;em&gt;1500 chars&lt;/em&gt;) and different metadata. Same pattern, different parameters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three things to notice:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;strong&gt;[PR #N] prefix&lt;/strong&gt; is intentional. Embedding models are sensitive to what's at the front of the text, so putting the artifact type and number first lets the model anchor on it. When I tried without the prefix, the same PR ranked lower for queries like &lt;em&gt;"what did PR 42 change?"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Each &lt;em&gt;sourceId&lt;/em&gt; is stable and globally unique (pr:42, issue:7, readme:root, topology:tree). That &lt;strong&gt;key&lt;/strong&gt; is what makes the upsert work, and it's also what lets a webhook re-embed a single PR after a merge without rebuilding the world. Same chunker, same SQL upsert, just one row.&lt;/li&gt;
&lt;li&gt;Commits get &lt;strong&gt;aggregated&lt;/strong&gt;, not chunked individually. This is the most non-obvious decision in the whole pipeline. If you embed every commit one-by-one, you drown the index in noise. I instead deduplicate commits already present inside a PR (they're embedded with the PR) and then summarize the leftover "orphan" commits into a single chunk:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Orphans = commits not already inside any PR&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prCommitShas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pullRequests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sha&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orphans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;prCommitShas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sha&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isNoiseCommit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orphans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;commits:orphan-summary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;truncate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`[Commits] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;orphans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; standalone commits (not in PRs) | Authors: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;authorsStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | Recent: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;recentMsgs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="mi"&gt;4000&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;commits&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;orphans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;orphan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before the orphan rolls up, a noise filter strips anything useless:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;NOISE_MSG_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="sr"&gt;/^merge branch/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^merge pull request/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^wip$/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^fix typo/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sr"&gt;/^fixup!/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^squash!/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^initial commit$/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^update &lt;/span&gt;&lt;span class="se"&gt;\S&lt;/span&gt;&lt;span class="sr"&gt;+$/i&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;NOISE_AUTHOR_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\[&lt;/span&gt;&lt;span class="sr"&gt;bot&lt;/span&gt;&lt;span class="se"&gt;\]&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^dependabot/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^renovate/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^github-actions/i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering bot commits and merge noise before they hit the embedding API saves cost, keeps the index dense, and stops "what's the architecture" queries from returning seventeen dependabot bumped lodash chunks.&lt;/p&gt;

&lt;p&gt;So… &lt;strong&gt;don't embed garbage&lt;/strong&gt;!&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Embeddings: picking the right model
&lt;/h2&gt;

&lt;p&gt;The boring truth: most embedding models are &lt;strong&gt;good&lt;/strong&gt; enough. The real trade-off is &lt;em&gt;dimension count × cost × domain fit&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I went with &lt;a href="https://docs.mistral.ai/models/model-cards/codestral-embed-25-05" rel="noopener noreferrer"&gt;Mistral AI codestral-embed-2505&lt;/a&gt; (&lt;em&gt;1536&lt;/em&gt; dimensions), a code-tuned embedding model ranks in a way a general-purpose model does not.&lt;/p&gt;

&lt;p&gt;Two main reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generous free-tier&lt;/strong&gt; → Mistral's free tier is generous enough to run real embedding workloads without hitting a paywall on day one of a side project. OpenAI's free credits evaporate the moment you embed a real dataset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain fit&lt;/strong&gt; → My data is &lt;strong&gt;code-adjacent&lt;/strong&gt;: commit messages, file paths, PR titles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The call itself is unremarkable, which is the point. The work happens in the &lt;strong&gt;chunking&lt;/strong&gt;, not here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// At query time, embed the user's question with the same model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mistral&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;textEmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;codestral-embed-2505&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Retrieval (that doesn't suck)
&lt;/h2&gt;

&lt;p&gt;Retrieval can be one query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vectorStr&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;projectIdsArray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`{&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;}`&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$queryRawUnsafe&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;MatchEmbeddingRow&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`SELECT pe.id, pe."projectId" as project_id, p.name as project_name,
          pe."sourceId" as source_id, pe.content, pe.metadata,
          (1 - (pe."embedding" &amp;lt;=&amp;gt; $1::vector(1536)))::float as similarity
   FROM "ProjectEmbedding" pe
   JOIN "Project" p ON p.id = pe."projectId"
   WHERE pe."projectId" = ANY($2::text[])
     AND 1 - (pe."embedding" &amp;lt;=&amp;gt; $1::vector(1536)) &amp;gt; 0.3
   ORDER BY pe."embedding" &amp;lt;=&amp;gt; $1::vector(1536)
   LIMIT 12`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;vectorStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;projectIdsArray&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Five&lt;/strong&gt; things this is doing on purpose:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; is the &lt;em&gt;pgvector&lt;/em&gt; &lt;strong&gt;cosine-distance operator.&lt;/strong&gt; Combined with the &lt;strong&gt;HNSW index&lt;/strong&gt; built on &lt;strong&gt;vector_cosine_ops&lt;/strong&gt;, this query uses the index instead of a sequential scan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-filter&lt;/strong&gt; by &lt;em&gt;projectId = ANY(...)&lt;/em&gt; before the vector search. Permissioning happens before similarity ranking, so &lt;strong&gt;you never see a chunk from a project you don't have access to&lt;/strong&gt;, and the index narrows the search space.&lt;/li&gt;
&lt;li&gt;Threshold of &lt;strong&gt;&lt;em&gt;0.3&lt;/em&gt; similarity&lt;/strong&gt;. Below that, the chunk is more noise than signal. &lt;strong&gt;Lower threshold → more recall → more garbage&lt;/strong&gt; in the prompt. Tune this on real queries, not synthetic ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top 12 results&lt;/strong&gt;. Enough that 2-3 misses still leave a &lt;strong&gt;usable signal&lt;/strong&gt;; small enough that the &lt;strong&gt;prompt stays cheap&lt;/strong&gt;. I started at 25 and it was overkill. The model latched onto the first 5 anyway and the rest were filler.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;JOIN&lt;/em&gt; the Project name in the &lt;strong&gt;SELECT&lt;/strong&gt;. When the query spans multiple repos, the &lt;strong&gt;model needs to know which repo a chunk came from&lt;/strong&gt;. The repo name shows up in the chunk payload, which is what lets the answer cite &lt;em&gt;[repo-A] vs [repo-B]&lt;/em&gt; accurately.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No &lt;em&gt;re-ranker&lt;/em&gt;. No keyword pre-filter. One stage.&lt;/p&gt;

&lt;p&gt;The chunking does enough work upfront that a &lt;strong&gt;second-stage ranker hasn't been worth its latency&lt;/strong&gt; yet, and that's a real result, not laziness.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Re-rankers&lt;/em&gt; earn their keep when your chunks are big, noisy, and undifferentiated. My chunks are &lt;strong&gt;small, typed, and prefixed&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Context assembly and the LLM call
&lt;/h2&gt;

&lt;p&gt;This is where Keystone diverges from textbook RAG.&lt;/p&gt;

&lt;p&gt;Classic RAG does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;embed(query) → search → concat(top_k) → prompt → generate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt(LLM, tools={search, tree, file}) → LLM decides → up to 10 tool calls → final answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM is the &lt;strong&gt;orchestrator&lt;/strong&gt;. It sees a system prompt that explains the two data sources, vectorized memory vs. live code, and the available repos. Then it chooses which tool to call.&lt;/p&gt;

&lt;p&gt;The split looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vectorized memory holds the why&lt;/strong&gt; → PR descriptions, issue threads, commit messages, the artifacts where decisions are explained. Vectors of these stay useful even when the code drifts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live file access holds the what&lt;/strong&gt; → The current &lt;em&gt;package.json&lt;/em&gt;, the current list of plugins, the current value of a constant. Stale vectors of months-old code lie about the present, so for "what" questions I read the file fresh &lt;strong&gt;via the GitHub API&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what the &lt;em&gt;agentic retrieval&lt;/em&gt; actually looks like in production logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;Chat Tool] searchTechnicalMemory: &lt;span class="s2"&gt;"relationship between open-webui, opencode, and openclaw"&lt;/span&gt; across 3 project&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;Chat Tool] Found 9 results &lt;span class="o"&gt;[&lt;/span&gt;
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'open-webui'&lt;/span&gt;, sourceId: &lt;span class="s1"&gt;'readme:root'&lt;/span&gt;,            similarity: 0.555 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'openclaw'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'readme:root'&lt;/span&gt;,            similarity: 0.544 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'opencode'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'readme:root'&lt;/span&gt;,            similarity: 0.503 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'openclaw'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'topology:tree'&lt;/span&gt;,          similarity: 0.465 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'open-webui'&lt;/span&gt;, sourceId: &lt;span class="s1"&gt;'topology:tree'&lt;/span&gt;,          similarity: 0.463 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'opencode'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'topology:tree'&lt;/span&gt;,          similarity: 0.463 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'opencode'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'commits:orphan-summary'&lt;/span&gt;, similarity: 0.439 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'openclaw'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'commits:orphan-summary'&lt;/span&gt;, similarity: 0.431 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'open-webui'&lt;/span&gt;, sourceId: &lt;span class="s1"&gt;'commits:orphan-summary'&lt;/span&gt;, similarity: 0.420 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model chose to &lt;strong&gt;search across all three repos in a single call&lt;/strong&gt;, it understood the query was cross-project without being told.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;readme:root&lt;/code&gt; chunks rank highest (&lt;em&gt;0.55&lt;/em&gt;, &lt;em&gt;0.54&lt;/em&gt;, &lt;em&gt;0.50&lt;/em&gt;) because &lt;em&gt;READMEs&lt;/em&gt; describe what a project &lt;em&gt;is&lt;/em&gt;, and the query asks exactly that.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;topology:tree&lt;/code&gt; chunks rank next: file structure is the second most useful signal for understanding how three repos relate.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;commits:orphan-summary&lt;/code&gt; chunks come in last but still above the &lt;em&gt;0.3&lt;/em&gt; floor, adding commit-level context without the noise of individual commits.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two practical effects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The model can iterate&lt;/strong&gt; → It might search memory, realize the answer needs a file, fetch the file, then answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The prompt stays small&lt;/strong&gt; → Only the chunks the model actually requested make it into the conversation. No &lt;em&gt;"stuff top-25 into system prompt"&lt;/em&gt; bloat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The synthesis model itself is &lt;a href="https://docs.mistral.ai/models/model-cards/devstral-small-2-25-12" rel="noopener noreferrer"&gt;Mistral AI devstral-small-latest&lt;/a&gt;: &lt;strong&gt;small&lt;/strong&gt;, &lt;strong&gt;cheap&lt;/strong&gt;, &lt;strong&gt;fast&lt;/strong&gt;. With good retrieval you don't need a frontier model for the writing step. The expensive part of "intelligence" is finding the right context. Writing a coherent paragraph from good context is the easy part.&lt;/p&gt;

&lt;p&gt;Every call gets logged with input/output tokens, step count, and finish reason, both to a usage table and to PostHog. That's the observability layer that lets me actually answer "is retrieval getting better or worse this week?" with a graph instead of a vibe.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1yqp3uunlpzd6l37xkdu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1yqp3uunlpzd6l37xkdu.gif" alt="Keystone input &amp;amp; output tokens usage" width="200" height="147"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The pipeline above (&lt;em&gt;typed chunking&lt;/em&gt;, &lt;strong&gt;code-tuned embeddings&lt;/strong&gt;, &lt;strong&gt;HNSW + pgvector&lt;/strong&gt;, and an &lt;strong&gt;LLM that knows when to search&lt;/strong&gt;) is what's running inside &lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; today.&lt;/p&gt;

&lt;p&gt;It's small, opinionated, and it works because every stage has one job and respects the constraints of the next.&lt;/p&gt;

&lt;p&gt;If there's one thing to take away: &lt;strong&gt;ignore the model leaderboards for a week and go obsess over your chunking&lt;/strong&gt;. That's where the wins are.&lt;/p&gt;

&lt;p&gt;The fanciest embedding model in the world &lt;strong&gt;can't rescue data that's been concatenated into mush&lt;/strong&gt;, and the cheapest model is plenty good when the chunks coming in are sharp, typed, and free of noise.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG isn't a magic upgrade for LLMs. It's a librarian, and a librarian is only as good as the way you organized the shelves.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; is the project I'm building to give software teams a living memory of their codebase: every PR, commit, issue, and decision, queryable in natural language.&lt;/p&gt;

&lt;p&gt;If you have any suggestions I'd love to hear them on the comments section!&lt;/p&gt;




&lt;p&gt;Thanks for reading! 👋&lt;/p&gt;

&lt;p&gt;Wences.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>softwareengineering</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Optimizing Nuxt + Prisma in Docker: How we cut our image size by 84%</title>
      <dc:creator>Wences Martinez</dc:creator>
      <pubDate>Thu, 05 Mar 2026 08:29:59 +0000</pubDate>
      <link>https://dev.to/wenchodev/optimizing-nuxt-prisma-in-docker-how-we-cut-our-image-size-by-84-15lb</link>
      <guid>https://dev.to/wenchodev/optimizing-nuxt-prisma-in-docker-how-we-cut-our-image-size-by-84-15lb</guid>
      <description>&lt;p&gt;By now, &lt;a href="//nuxt.com"&gt;Nuxt&lt;/a&gt; hardly needs an introduction as one of the &lt;strong&gt;most used full-stack frameworks&lt;/strong&gt;. For those new to the Vue ecosystem, it is essentially the counterpart to Next.js in the React world.&lt;/p&gt;

&lt;p&gt;So, Nuxt is not just a frontend framework. With Nitro as the server engine, you get SSR, API routes, server middleware… it’s a &lt;strong&gt;real full stack framework&lt;/strong&gt;. We’ve been using it at Resizes for a long time to build complex applications, and it works really well!&lt;/p&gt;

&lt;p&gt;That sounds really cool, right? But what is the most &lt;strong&gt;efficient&lt;/strong&gt;, &lt;strong&gt;fastest&lt;/strong&gt;, and most &lt;strong&gt;lightweight&lt;/strong&gt; way to build a Docker image for a Nuxt app?&lt;/p&gt;

&lt;p&gt;Let’s talk about it!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5x0tiwxw1w61cgup5qd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5x0tiwxw1w61cgup5qd.png" alt=" " width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Our Use Case
&lt;/h3&gt;

&lt;p&gt;At &lt;a href="//resiz.es"&gt;Resizes&lt;/a&gt; we have developed several applications with Nuxt4, and one of them has several modules and dependencies; an &lt;strong&gt;auth&lt;/strong&gt; library, an &lt;strong&gt;ORM&lt;/strong&gt; to handle database migrations and even an &lt;strong&gt;embedded database&lt;/strong&gt;, so dockerizing it efficiently was not as straightforward as it seems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihlxxgx3mmylgtair8sj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihlxxgx3mmylgtair8sj.png" alt="Our Nuxt stack" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post I’ll share how we do it, with &lt;strong&gt;real numbers&lt;/strong&gt; from our GitHub Actions pipeline and the hidden traps we found along the way.&lt;/p&gt;

&lt;h3&gt;
  
  
  The stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Nuxt 4 + &lt;a href="https://nitro.build/" rel="noopener noreferrer"&gt;Nitro&lt;/a&gt; (server engine)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.prisma.io/" rel="noopener noreferrer"&gt;Prisma ORM&lt;/a&gt; + SQLite via &lt;a href="https://www.npmjs.com/package/better-sqlite3" rel="noopener noreferrer"&gt;&lt;em&gt;better-sqlite3&lt;/em&gt;&lt;/a&gt; (native C++ module)&lt;/li&gt;
&lt;li&gt;Docker with multi-stage build&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following &lt;em&gt;package.json&lt;/em&gt; reveals some of our most used dependencies regarding a Nuxt 4 application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;package.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"generate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt generate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"preview"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt preview"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"postinstall"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt prepare &amp;amp;&amp;amp; prisma generate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eslint ."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lint:fix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eslint . --fix"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"better-sqlite3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~12.0.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@prisma/client"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~7.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@prisma/adapter-better-sqlite3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~7.3.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@anthropic-ai/sdk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~0.72.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"better-auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~1.4.17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"zod"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.3.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prisma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~7.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dotenv"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~17.2.3"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"devDependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"nuxt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.2.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@nuxt/ui"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.3.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@nuxt/eslint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~1.12.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~3.5.27"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@vueuse/nuxt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~14.1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"typescript"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~5.9.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"eslint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~9.39.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"eslint-config-prettier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~10.1.8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prettier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~3.8.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tsx"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.21.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  The Dockerfile: single vs multi stage
&lt;/h3&gt;

&lt;p&gt;Can you deploy a Nuxt application with a basic Dockerfile? Of course you can! But that &lt;strong&gt;doesn’t mean you should&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;While a single-stage Dockerfile will technically build your app, it drags a &lt;strong&gt;huge amount of baggage&lt;/strong&gt; into production. We’re talking about build tools, dev dependencies, source code, and artifacts.&lt;/p&gt;

&lt;p&gt;You end up shipping the entire kitchen sink instead of just the lean, compiled application.&lt;/p&gt;

&lt;p&gt;An example of a basic single-stage Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Single-stage Dockerfile (for comparison only — DO NOT use in production)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:24-slim&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3 make g++ &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; DATABASE_URL="file:./dummy.db"&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci
&lt;span class="k"&gt;RUN &lt;/span&gt;npx prisma generate &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npx nuxt build

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_HOST=0.0.0.0&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_PORT=3000&lt;/span&gt;

&lt;span class="c"&gt;# Still need to fix Nitro stubs even in single-stage&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; .output/server/node_modules
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; node_modules .output/server/node_modules

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; docker-entrypoint.sh ./docker-entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./docker-entrypoint.sh

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["./docker-entrypoint.sh"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", ".output/server/index.mjs"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ve done the test and we’ve built our app with the above Dockerfile vs a multi-stage Dockerfile to compare the size of the final image between them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The results are crazy: &lt;strong&gt;4.05 GB vs 637MB&lt;/strong&gt;. The multi-stage Dockerfile is almost &lt;strong&gt;x7 times smaller&lt;/strong&gt; than the single-stage Dockerfile 🫢&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud6alp9mz18tsi0bbw40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud6alp9mz18tsi0bbw40.png" alt="Single-stage vs multi-stage image" width="800" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F075gv2p9de0z1m6oaqae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F075gv2p9de0z1m6oaqae.png" alt="Docker Layer Anatomy: Where Does All That Space Go?" width="800" height="209"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  But… how we did it?
&lt;/h2&gt;

&lt;p&gt;Multi-stage build separates the process into phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; stages: have everything needed to compile. They’re temporary images that get &lt;strong&gt;discarded from the final image&lt;/strong&gt;!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runner&lt;/strong&gt; stage: contains only the minimum code to run the app. It’s the only image pushed to the registry!
The concrete benefits:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Smaller final image.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced &lt;strong&gt;attack surface&lt;/strong&gt; — no compilers in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster&lt;/strong&gt; deploys — less pull time on each Kubernetes node.&lt;/li&gt;
&lt;li&gt;Automatic &lt;strong&gt;parallelism&lt;/strong&gt; — Docker runs independent stages in parallel (if required).&lt;/li&gt;
&lt;li&gt;Granular &lt;strong&gt;caching&lt;/strong&gt; — if you only change code, the dependencies stage is fully cached.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our Dockerfile has 4 stages. Let’s go through each one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y3m7sexqhd9du98seti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y3m7sexqhd9du98seti.png" alt="Dockerfile multistage steps" width="319" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1: Base tools
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# -------- Base --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:24-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3 make g++ openssl &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’re probably wondering why the heck do we need &lt;em&gt;python3&lt;/em&gt;, &lt;em&gt;make&lt;/em&gt; &amp;amp; &lt;em&gt;g++&lt;/em&gt; dependencies if this is a node environment application.&lt;/p&gt;

&lt;p&gt;Easy, we’re using &lt;em&gt;SQLite&lt;/em&gt; as a embedded database through the super fast &lt;a href="https://www.npmjs.com/package/better-sqlite3" rel="noopener noreferrer"&gt;better-sqlite3&lt;/a&gt; library, and this library needs to &lt;strong&gt;compile native C++ bindings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;SQLite is a library written in C, and to use it from Node.js, better-sqlite3 includes a “binding” (a bridge between JavaScript and compiled C code).&lt;/p&gt;

&lt;p&gt;This binding is compiled during npm install using node-gyp, which needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;g++&lt;/em&gt; — C++ compiler&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;make&lt;/em&gt; — orchestrates the compilation&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;python3&lt;/em&gt; — because node-gyp is written in Python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a &lt;em&gt;.node&lt;/em&gt; file (binary addon) specific to your architecture and Node version.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;These tools take up ~293MB, but they won’t be in the final image 💪🏼&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Stage 2: Full dependencies
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;
&lt;span class="c"&gt;# -------- Dependencies --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;deps&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package.json package-lock.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--ignore-scripts&lt;/span&gt;
In the second stage we’re only copying the package.json and package-lock.json and run npm ci with the ignore-scripts flag.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the second stage we’re only copying the &lt;em&gt;package.json&lt;/em&gt; and &lt;em&gt;package-lock.json&lt;/em&gt; and run &lt;em&gt;npm ci&lt;/em&gt; with the &lt;em&gt;ignore-scripts&lt;/em&gt; flag.&lt;/p&gt;

&lt;p&gt;This flag is really important if you want to &lt;strong&gt;avoid any unnecessary drama&lt;/strong&gt;. Tools like Prisma or Nuxt love to run ‘postinstall’ scripts the second they’re downloaded, but since we haven’t copied the full source code yet, those scripts would just crash.&lt;/p&gt;

&lt;p&gt;By skipping the installation with this flag, we keep it lightning-fast, let Docker cache the layer perfectly, and save the actual heavy lifting for the Build stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: Building the application
&lt;/h2&gt;

&lt;p&gt;This is where the heavy lifting happens!&lt;/p&gt;

&lt;p&gt;Now that we have our dependencies ready, we perform a &lt;strong&gt;triple threat of critical tasks&lt;/strong&gt;: generating the &lt;strong&gt;Prisma client&lt;/strong&gt;, compiling the &lt;strong&gt;Nuxt bundle&lt;/strong&gt;, and &lt;strong&gt;rebuilding &lt;em&gt;better-sqlite3&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This last step is vital — it ensures our SQLite driver is natively compiled for the Linux environment, avoiding those dreaded “mismatched binary” errors in production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# -------- Build --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_OPTIONS="--max-old-space-size=4096"&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=deps /app/node_modules ./node_modules&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; DATABASE_URL="file:./dummy.db"&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npx prisma generate &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npx nuxt build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm rebuild better-sqlite3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;npx prisma generate&lt;/em&gt;&lt;/strong&gt;: Generates the Prisma Client based on your schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;npx nuxt build&lt;/em&gt;&lt;/strong&gt;: This command triggers the full production build, bundling the client and server-side code through the Nitro engine into a standalone directory (.output folder)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;npx rebuild better-sqlite3&lt;/em&gt;&lt;/strong&gt;: The “secret sauce” for SQLite. This recompiles the native C++ bindings for our database driver directly inside the Linux container. It ensures the binary perfectly matches our production environment, preventing the dreaded “mismatched architecture” errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Stage 5: The production image
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# -------- Runtime --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:24-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runner&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; openssl &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_HOST=0.0.0.0&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_PORT=3000&lt;/span&gt;

&lt;span class="c"&gt;# Minimal Prisma CLI for migrations - must run BEFORE package.json is copied,&lt;/span&gt;
&lt;span class="c"&gt;# otherwise npm sees the app's package.json and installs ALL dependencies.&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package-lock.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--ignore-scripts&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="s2"&gt;"prisma@&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"require('./package-lock.json').packages['node_modules/prisma'].version"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="s2"&gt;"dotenv@&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"require('./package-lock.json').packages['node_modules/dotenv'].version"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm &lt;/span&gt;package-lock.json

&lt;span class="c"&gt;# Copy compiled app (keeps Nitro's runtime packages in .output/server/node_modules/)&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/.output ./.output&lt;/span&gt;

&lt;span class="c"&gt;# Replace better-sqlite3 with the Linux binary compiled in the build stage.&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; .output/server/node_modules/better-sqlite3
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/node_modules/better-sqlite3 ./.output/server/node_modules/better-sqlite3&lt;/span&gt;

&lt;span class="c"&gt;# Copy Prisma config + migrations folder for migrate deploy&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/prisma ./prisma&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/prisma.config.ts ./prisma.config.ts&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; docker-entrypoint.sh ./docker-entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./docker-entrypoint.sh

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["./docker-entrypoint.sh"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", ".output/server/index.mjs"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The final image uses &lt;em&gt;node:24-slim&lt;/em&gt;; no compilers, no Python, no dev tools. Just Node.js and your app.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three important details here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We leave .output/server/node_modules intact — except for better-sqlite3, which we delete and replace with a freshly compiled binary (in build stage).&lt;/li&gt;
&lt;li&gt;We install only &lt;strong&gt;prisma&lt;/strong&gt; and &lt;strong&gt;dotenv&lt;/strong&gt; into /app/node_modules, the bare minimum needed to run prisma migrate deploy in the entrypoint script. The rest of the app’s dependencies are already bundled inside .output/server/node_modules by Nitro.&lt;/li&gt;
&lt;li&gt;Automatic Prisma migrations with &lt;em&gt;entrypoint.sh&lt;/em&gt;, this script runs &lt;strong&gt;&lt;em&gt;prisma migrate deploy&lt;/em&gt;&lt;/strong&gt; every time the container starts, ensuring your database always has the latest schema:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/sh&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Applying database migrations..."&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;NODE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/app/node_modules"&lt;/span&gt;
/app/node_modules/.bin/prisma migrate deploy

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Starting application..."&lt;/span&gt;
&lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$@&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  GitHub Actions pipeline results
&lt;/h2&gt;

&lt;p&gt;These are real numbers from our GitHub Actions pipeline for AMD64 architecture when building the app’s image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5o8lwtk9i34zje27ih2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5o8lwtk9i34zje27ih2n.png" alt="GitHub Actions Runner results" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Multi-arch tip: If you build ARM64 images on AMD64 runners via QEMU, expect 4–7x slower builds. Use native ARM runners if speed matters!&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Hidden Gotchas
&lt;/h2&gt;

&lt;p&gt;Nitro can’t bundle everything… some packages must stay in node_modules!&lt;/p&gt;

&lt;p&gt;Nitro uses Rollup/esbuild to bundle your server code into optimized &lt;em&gt;.mjs&lt;/em&gt; files — but not everything can be bundled.&lt;/p&gt;

&lt;p&gt;Modules like &lt;em&gt;better-sqlite3&lt;/em&gt; contain native C++ bindings (.node files), which are platform-specific binaries that simply &lt;strong&gt;cannot be converted into a .mjs file&lt;/strong&gt;. For those, Nitro falls back to copying them — along with their full dependency tree — into &lt;em&gt;.output/server/node_modules/&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So, we have to leave &lt;em&gt;.output/server/node_modules&lt;/em&gt; intact, except for &lt;em&gt;better-sqlite3&lt;/em&gt;, which we delete and replace with a freshly compiled binary (previously generated with &lt;em&gt;npm rebuild better-sqlite3&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;We install only &lt;em&gt;prisma&lt;/em&gt; and &lt;em&gt;dotenv&lt;/em&gt; into the root &lt;em&gt;/app/node_modules&lt;/em&gt;; the bare minimum needed to run &lt;em&gt;prisma migrate deploy&lt;/em&gt; in the entrypoint. The rest of the app’s dependencies are already bundled inside &lt;em&gt;.output/server&lt;/em&gt; folder by Nitro.&lt;/p&gt;

&lt;p&gt;💡 We explicitly mark &lt;em&gt;better-sqlite3&lt;/em&gt;, among other dependencies, as external in the &lt;em&gt;nuxt.config.ts&lt;/em&gt; file, which tells Nitro to skip bundling them entirely and copy them as-is into &lt;em&gt;.output/server/node_modules&lt;/em&gt; directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// nuxt.config.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineNuxtConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;nitro&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;esbuild&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;es2020&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;externals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;better-sqlite3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Dockerizing a full-stack Nuxt application isn’t trivial when you have native modules or you want to reduce significantly the final image size, but with a solid multi-stage build it’s completely viable.&lt;/p&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-stage build&lt;/strong&gt; is not optional — it’s the difference between &lt;strong&gt;637MB and 4.05 GB&lt;/strong&gt;. A &lt;strong&gt;84% image size reduction&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The numbers speak: &lt;strong&gt;~4 minutes&lt;/strong&gt; for a complete AMD64 build, lightweight final image ready for production.&lt;/li&gt;
&lt;li&gt;Don’t delete &lt;em&gt;.output/server/node_modules&lt;/em&gt; — Nitro copies the externalized packages there and needs them at runtime. Only &lt;strong&gt;replace the ones with native binaries&lt;/strong&gt; (like better-sqlite3) that need to be recompiled for Linux.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re evaluating Nuxt for your next full-stack project, give it a shot. The Vue ecosystem has a first-class full stack framework, and with Docker, deploying it is just matter of minutes!&lt;/p&gt;




&lt;p&gt;This post is based on our real experience deploying Nuxt applications in production at &lt;a href="//resiz.es"&gt;Resizes&lt;/a&gt;. All timings and data are from our GitHub Actions pipeline.&lt;/p&gt;

&lt;p&gt;Feel free to share this post among your community!&lt;/p&gt;

&lt;p&gt;See you! 👋🏻&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>nuxt</category>
      <category>docker</category>
      <category>prisma</category>
    </item>
  </channel>
</rss>
