<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Evdoshchenko</title>
    <description>The latest articles on DEV Community by David Evdoshchenko (@outcomer).</description>
    <link>https://dev.to/outcomer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3808514%2F4e8986ae-073e-4785-995f-39747ad696b1.jpg</url>
      <title>DEV Community: David Evdoshchenko</title>
      <link>https://dev.to/outcomer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/outcomer"/>
    <language>en</language>
    <item>
      <title>How I Built an llms.txt Generator That Actually Works at Scale</title>
      <dc:creator>David Evdoshchenko</dc:creator>
      <pubDate>Tue, 12 May 2026 08:16:00 +0000</pubDate>
      <link>https://dev.to/outcomer/how-i-built-an-llmstxt-generator-that-actually-works-at-scale-1hh8</link>
      <guid>https://dev.to/outcomer/how-i-built-an-llmstxt-generator-that-actually-works-at-scale-1hh8</guid>
      <description>&lt;p&gt;&lt;em&gt;This is the technical companion to my &lt;a href="https://dev.to/outcomer/i-built-an-llmstxt-generator-showed-it-to-the-creator-of-the-standard-and-had-to-rewrite-57c2"&gt;I Built an llms.txt Generator, Showed It to the Creator of the Standard, and Had to Rewrite Everything&lt;/a&gt; — the human side is there, here's just the engineering.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The goal: automatically generate a proper llms.txt hierarchy for any website — not a flat index of summaries, but a structured set of MD files where semantically related pages are merged into coherent documents. Here's how each layer works and what broke along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sitemap → Crawler → Embedder → Clusterer → Summarizer → llms.txt + MD files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five stages. Each runs at a different speed. Each has its own failure modes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 1: Crawling
&lt;/h2&gt;

&lt;p&gt;Standard crawling with content extraction. The output per page: path, title, clean text.&lt;/p&gt;

&lt;p&gt;Pages that fail to crawl are tracked but don't stop the pipeline — a missing page just doesn't contribute to its cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 2: Embeddings + Caching
&lt;/h2&gt;

&lt;p&gt;Each page gets converted to a vector using Gemini's &lt;code&gt;gemini-embedding-001&lt;/code&gt; model.&lt;/p&gt;

&lt;p&gt;Vectors are cached in Redis keyed by hostname + model + path. On reprocessing the same site, already-embedded pages are served from cache instantly — no API calls, no cost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allCached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cacheService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hmget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;hashKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;allPaths&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`vectors:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Cache hits skip embedding entirely&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters because embeddings are the most parallelizable step and you don't want to redo them on retries or restarts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 3: K-Means Clustering with Cosine Similarity
&lt;/h2&gt;

&lt;p&gt;Pages cluster by semantic meaning, not URL structure. Two pages about the same concept with different URL paths end up in the same cluster. Three pages under &lt;code&gt;/payments/checkout/&lt;/code&gt; that cover different topics end up in different clusters.&lt;/p&gt;

&lt;p&gt;K-means implemented directly in TypeScript — no external ML libraries. Cosine similarity as the distance metric, not Euclidean — cosine works much better for high-dimensional embedding vectors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;kMeans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[][],&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxIterations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;centroids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;assignments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;iter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;iter&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;maxIterations&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;iter&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newAssignments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;minDist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;nearest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;centroids&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;centroids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;minDist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;minDist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;nearest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;nearest&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;changed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newAssignments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;assignments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="nx"&gt;assignments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newAssignments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;changed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;centroids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;assignments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;members&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;centroids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;members&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;members&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;assignments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The number of clusters is dynamic — determined by the actual content distribution, not a fixed parameter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 4: Cluster Summarization with Context Caching
&lt;/h2&gt;

&lt;p&gt;Each cluster goes through a two-phase generation process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — Structure:&lt;/strong&gt; One LLM call that returns the section name, description, and how many output MD pages to generate. The LLM decides whether to merge source pages, split them, or skip irrelevant ones — I don't impose that from outside.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — Content:&lt;/strong&gt; One call per output page, generating filename, title, summary, and full markdown content.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Context Caching Problem
&lt;/h3&gt;

&lt;p&gt;Naive implementation: send all cluster page content as context on every call. For a cluster generating 5 MD files, you pay for those input tokens 5 times.&lt;/p&gt;

&lt;p&gt;Fix: &lt;strong&gt;Gemini Context Caching&lt;/strong&gt;. Upload the cluster content once, get a cache reference, use it for all subsequent calls within the cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;createCacheStrategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;pagesText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;baseConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;caches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;600s&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Pages:\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pagesText&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;baseConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;cachedContent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;getContents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;dispose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;caches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;refreshIfNeeded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;caches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;600s&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Gemini requires minimum token count for caching.&lt;/span&gt;
    &lt;span class="c1"&gt;// Small clusters fall back to inline context.&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;ApiError&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;min_total_token_count&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;baseConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;systemInstruction&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;getContents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`Pages:\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pagesText&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;dispose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;refreshIfNeeded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cache TTL is 600 seconds. For large clusters that take longer, &lt;code&gt;refreshIfNeeded()&lt;/code&gt; is called after each page generation to reset the TTL before it expires.&lt;/p&gt;

&lt;p&gt;The cache is deleted in the &lt;code&gt;finally&lt;/code&gt; block after the cluster finishes — no leaking paid cache slots.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 5: Buffers Between Layers
&lt;/h2&gt;

&lt;p&gt;Crawling, embedding, and summarizing run at completely different speeds. The crawler is fast. The embedder processes in batches. The summarizer is slow — LLM calls take seconds to minutes each.&lt;/p&gt;

&lt;p&gt;Wire them together naively and they block each other. The crawler piles up thousands of pages while the embedder waits. The summarizer starves waiting for embeddings to flush.&lt;/p&gt;

&lt;p&gt;Solution: in-memory buffers between each layer. Each stage writes to its output buffer and reads from the previous stage's buffer. Concurrency is controlled independently per stage via the AIMD queue.&lt;/p&gt;




&lt;h2&gt;
  
  
  The AIMD Queue
&lt;/h2&gt;

&lt;p&gt;The core reliability mechanism. AIMD — Additive Increase Multiplicative Decrease — is the same algorithm TCP uses for congestion control, applied to LLM API calls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;onSuccess&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;successStreak&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;successStreak&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;concurrency&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;concurrency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxConcurrency&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxConcurrency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;successStreak&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;onRateLimit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ErrorKind&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;concurrency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;concurrency&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;successStreak&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start at concurrency = 1&lt;/li&gt;
&lt;li&gt;After a full successful round: concurrency += 1&lt;/li&gt;
&lt;li&gt;On 429 or 503: concurrency = floor(concurrency / 2)&lt;/li&gt;
&lt;li&gt;Failed tasks re-enqueue at the &lt;strong&gt;front&lt;/strong&gt; of the queue, not the back — so they're not starved behind new work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 429 responses specifically, the queue reads the actual delay from Google's &lt;code&gt;google.rpc.RetryInfo&lt;/code&gt; response header instead of using a fixed backoff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="nf"&gt;extractRetryDelayMs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;errObj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;details&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;errObj&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;details&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retryInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;details&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;type.googleapis.com/google.rpc.RetryInfo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retryDelayStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;retryInfo&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;retryDelay&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryDelayStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryDelayStr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;15000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// fallback&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the API says wait 15 seconds, wait 15 seconds. Not 1 second (too aggressive), not 60 seconds (too slow).&lt;/p&gt;

&lt;p&gt;Max 16 attempts per task. 5 minute timeout per call. After that, permanent failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Typed LLM Exceptions
&lt;/h2&gt;

&lt;p&gt;LLM failures aren't all the same. Treating them identically means either retrying things that can't succeed or giving up on things that would succeed on the next attempt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Response isn't valid JSON at all&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LlmJsonValidationException&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;LlmBaseException&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="c1"&gt;// Asked for N summaries, got M&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LlmResponseCountMismatchException&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;LlmBaseException&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;invalidResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;attemptNumber&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// JSON valid but missing required fields&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LlmInvalidSummaryFieldException&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;LlmBaseException&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each type carries the context needed to decide the retry strategy. &lt;code&gt;LlmResponseCountMismatchException&lt;/code&gt; includes what was expected vs received — useful for adjusting the prompt on retry.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resumable Processing
&lt;/h2&gt;

&lt;p&gt;Tasks have a max attempt count. After 16 failures, a task is dropped from the queue permanently.&lt;/p&gt;

&lt;p&gt;But the order stays in the database. The user can restart it from the frontend.&lt;/p&gt;

&lt;p&gt;And because all intermediate results — vectors, generated MD files — are cached in Redis, restarting picks up where it left off. Pages that already embedded don't re-embed. Clusters that already generated their MD files don't regenerate. Only the failed work gets retried.&lt;/p&gt;

&lt;p&gt;This matters because a large order can fail partway through due to external issues (API downtime, budget limits, the German spaces problem described below) and you don't want to throw away hours of completed work.&lt;/p&gt;




&lt;h2&gt;
  
  
  The German Spaces Problem
&lt;/h2&gt;

&lt;p&gt;For reasons I still don't fully understand, Gemini sometimes responds to German-language content with a valid response followed by approximately 2,000,000 spaces. This hits max token limits and the call crashes.&lt;/p&gt;

&lt;p&gt;It's not consistent. It's not reproducible on demand. It just happens sometimes with German text.&lt;/p&gt;

&lt;p&gt;The handling: 3 retries, then permanent failure for that task. The order remains restartable. The resumable processing above means retrying the order doesn't cost anything for already-completed work.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Output Looks Like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# docs.stripe.com&lt;/span&gt;
Comprehensive documentation for integrating Stripe payments...

&lt;span class="gu"&gt;## Account Setup And Management&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Account Setup Overview&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;/account-setup-and-management/overview.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: ...

&lt;span class="gu"&gt;## Payments&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ZIP archive: &lt;code&gt;llms.txt&lt;/code&gt; at root, subfolders with the &lt;code&gt;.md&lt;/code&gt; files.&lt;/p&gt;

&lt;p&gt;Same site, both strategies side by side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://llmstxtgenerator.svcpool.com/gists/llms-https___docs.stripe.com__flat.txt" rel="noopener noreferrer"&gt;Flat strategy output&lt;/a&gt; — one summary per URL&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://llmstxtgenerator.svcpool.com/gists/llms-https___docs.stripe.com__clustered.zip" rel="noopener noreferrer"&gt;Clustered strategy output&lt;/a&gt; — semantic clusters, MD files, ZIP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stripe's ~4,000 pages: under an hour on a 4-core / 8GB server.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Not Solved Yet
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Multilingual sites.&lt;/strong&gt; When Stripe added German translations their URL count jumped from ~3,500 to ~4,300. The generator processed everything and produced German MD files for what should be an English documentation site. Planned fix: URL pattern filter as an order parameter — you specify which URL patterns to include, the crawler respects it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom prompts.&lt;/strong&gt; The summarization prompt is currently fixed. Making it a user parameter would allow different output styles — technical reference, narrative guides, API docs — without changing the pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  An Observation About the Future
&lt;/h2&gt;

&lt;p&gt;There are already tools that generate complete human-readable documentation sites from markdown source files.&lt;/p&gt;

&lt;p&gt;This suggests an interesting workflow inversion:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate &lt;code&gt;llms.txt&lt;/code&gt; + structured markdown (the AI-oriented layer)&lt;/li&gt;
&lt;li&gt;Generate the human-facing website from those same markdown files&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The markdown becomes the source of truth. Write once, publish twice — one version for humans, one for AI agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;Generator: &lt;strong&gt;&lt;a href="https://llmstxtgenerator.svcpool.com/" rel="noopener noreferrer"&gt;llmstxtgenerator.svcpool.com&lt;/a&gt;&lt;/strong&gt;. Free tier up to 5,000 pages. API available.&lt;/p&gt;

&lt;p&gt;Full spec: &lt;a href="https://llmstxt.org/" rel="noopener noreferrer"&gt;llmstxt.org&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>llms</category>
      <category>typescript</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>I Built an llms.txt Generator, Showed It to the Creator of the Standard, and Had to Rewrite Everything</title>
      <dc:creator>David Evdoshchenko</dc:creator>
      <pubDate>Tue, 12 May 2026 08:15:00 +0000</pubDate>
      <link>https://dev.to/outcomer/i-built-an-llmstxt-generator-showed-it-to-the-creator-of-the-standard-and-had-to-rewrite-57c2</link>
      <guid>https://dev.to/outcomer/i-built-an-llmstxt-generator-showed-it-to-the-creator-of-the-standard-and-had-to-rewrite-57c2</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a follow-up to my &lt;a href="https://dev.to/outcomer/launching-another-one-llmstxt-generator-13b0"&gt;previous post&lt;/a&gt; — I shipped v1, got feedback from the creator of the standard, and had to rethink everything.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm not going to pretend I had a plan. I didn't.&lt;/p&gt;

&lt;p&gt;There's a new standard called &lt;a href="https://llmstxt.org/" rel="noopener noreferrer"&gt;llms.txt&lt;/a&gt; — a file you place in your site root to help AI agents navigate your content. Think of it as &lt;code&gt;robots.txt&lt;/code&gt; for AI agents: instead of telling crawlers what to ignore, it tells agents how to understand your site.&lt;/p&gt;

&lt;p&gt;I saw other generators producing it. I built one too, shipped it, and thought I was done.&lt;/p&gt;

&lt;p&gt;Then I talked to Jeremy Howard — the creator of the standard — and discovered I had fundamentally misunderstood what llms.txt is supposed to be. And then things got complicated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Version 1: Copying What Everyone Else Was Doing
&lt;/h2&gt;

&lt;p&gt;The pattern every generator was producing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Page Title&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://example.com/page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: AI-generated summary.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Another Page&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;https://example.com/other&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;: AI-generated summary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One URL → one summary. I did the same. I called this the &lt;strong&gt;Flat strategy&lt;/strong&gt; and shipped it. It's still in the generator because people ask for it. But I no longer think it's what llms.txt is actually for.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Conversation That Changed Everything
&lt;/h2&gt;

&lt;p&gt;About six weeks after shipping I posted in the llms.txt Discord and asked Jeremy Howard directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For a site like Stripe with 4,000 pages, how do you balance curation vs completeness? If you curate down to 20 pages, an agent asking about webhook retries won't find the answer. If you include everything, it's just noise.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Jeremy Howard&lt;/strong&gt; (&lt;a href="https://discord.com/channels/689892369998676007/1279960087221239808/1484298533485023232" rel="noopener noreferrer"&gt;source&lt;/a&gt;):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Yeah it's actually a lot of work - it's not just writing an llms.txt, but writing complete agent-oriented docs. It's basically like a parallel web - one designed for AI! An llms.txt is just AI's index.html replacement - it contains links to other md pages, which themselves can have links."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then the part that really got me — he wasn't just describing a different structure. He was saying that llms.txt &lt;strong&gt;shouldn't be generated automatically at all&lt;/strong&gt;. It should be carefully curated by humans who understand their site.&lt;/p&gt;

&lt;p&gt;I sat with that for a bit.&lt;/p&gt;

&lt;p&gt;On one hand, he's right. On the other hand — Stripe has 4,000 pages. Nobody is writing those MD files by hand. And weather.com has 1,140,000 URLs. The math doesn't work for human curation at scale.&lt;/p&gt;

&lt;p&gt;So I decided to try anyway. If it can't be done automatically, let's see how close we can get.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me&lt;/strong&gt; (&lt;a href="https://discord.com/channels/689892369998676007/1279960087221239808/1484619579480346835" rel="noopener noreferrer"&gt;source&lt;/a&gt;):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"That changes everything. The 'curation vs completeness' tension I was worried about disappears when you have progressive disclosure through a hierarchy."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I went and built Version 2.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Version 2 Actually Required
&lt;/h2&gt;

&lt;p&gt;The vision was simple: semantically group related pages, synthesize each group into one MD file. An agent gets 30 focused documents instead of 4,000 individual pages.&lt;/p&gt;

&lt;p&gt;The reality was five separate engineering problems, each hiding behind the previous one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 1 — Context windows.&lt;/strong&gt; To group pages by meaning you need to read them. 4,000 pages of full text doesn't fit in any LLM context window. Solution: embeddings + k-means clustering. Each page becomes a vector, similar pages cluster together, the LLM only ever sees one cluster at a time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2 — Money.&lt;/strong&gt; Each LLM call was sending the same cluster content over and over as input tokens. For a cluster generating 5 MD files I was paying for those pages 5 times. Solution: Gemini Context Caching. Upload the cluster content once, reuse the cached reference for all subsequent calls within that cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 3 — Layers at different speeds.&lt;/strong&gt; Crawling, embedding, and summarizing run at completely different speeds. Wire them together naively and they block each other — the crawler piles up work while the summarizer starves. Solution: in-memory buffers between each layer, with independent concurrency control per layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 4 — The LLM is unreliable.&lt;/strong&gt; At scale, everything fails eventually. Gemini returns 429, 503, valid JSON with the wrong number of items, invalid JSON, or just times out after 4 minutes. Solution: typed exception hierarchy for LLM failures, AIMD queue with proper backoff that reads actual &lt;code&gt;retry-after&lt;/code&gt; values from Google's API instead of guessing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 5 — German spaces.&lt;/strong&gt; For reasons I still don't fully understand, Gemini sometimes responds to German-language content with a valid response followed by approximately 2,000,000 spaces. This hits max token limits and crashes. Solution: 3 retries, then the task is dropped — but the order stays in the database and can be restarted from the frontend. Intermediate results are cached in Redis so restarting picks up where it left off without re-spending tokens.&lt;/p&gt;

&lt;p&gt;Each problem only became visible after solving the previous one. That's how it goes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unsolved Problem: Multilingual Sites
&lt;/h2&gt;

&lt;p&gt;When I started, Stripe's docs had ~3,500 URLs. Then they added German translations and it jumped to ~4,300. The generator processed everything and started producing German-language MD files for what should be an English documentation site.&lt;/p&gt;

&lt;p&gt;There's no universal logic to detect canonical single-language content from a sitemap. Every site has its own URL scheme. Planned fix: a URL pattern filter as an order parameter — you specify which patterns to include, the crawler respects it.&lt;/p&gt;




&lt;h2&gt;
  
  
  An Observation About the Future
&lt;/h2&gt;

&lt;p&gt;While building this I noticed something: there are already tools that generate complete human-readable documentation sites from markdown source files.&lt;/p&gt;

&lt;p&gt;This opens up an interesting workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate &lt;code&gt;llms.txt&lt;/code&gt; + structured markdown (the AI layer)&lt;/li&gt;
&lt;li&gt;Generate the human-facing website from those same markdown files&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The markdown becomes the source of truth. Write once, publish twice — one version for humans, one for AI agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Epilogue
&lt;/h2&gt;

&lt;p&gt;I don't know if this will ever make me a dollar.&lt;/p&gt;

&lt;p&gt;But since building this, people started reaching out on LinkedIn. Interview invitations started coming in. I talked to Microsoft today.&lt;/p&gt;

&lt;p&gt;And I find it genuinely funny when I get coding assessments that ask me to build a notification service — clean code, scalable architecture, perfect database design, dockerized, fully tested — in 3 days.&lt;/p&gt;

&lt;p&gt;I spent several months building something like what you just read about, and I'm still not sure I got it right.&lt;/p&gt;

&lt;p&gt;Those 3-day assessments make me laugh.&lt;/p&gt;




&lt;p&gt;*Want to understand how this actually works under the hood — embeddings, clustering, context caching, the AIMD queue? I wrote a separate technical deep-dive: &lt;a href="https://dev.to/outcomer/how-i-built-an-llmstxt-generator-that-actually-works-at-scale-1hh8"&gt;How I Built an llms.txt Generator That Actually Works at Scale&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The generator is at &lt;strong&gt;&lt;a href="https://llmstxtgenerator.svcpool.com/" rel="noopener noreferrer"&gt;llmstxtgenerator.svcpool.com&lt;/a&gt;&lt;/strong&gt;. Free tier up to 5,000 pages.&lt;/p&gt;

&lt;p&gt;Full spec: &lt;a href="https://llmstxt.org/" rel="noopener noreferrer"&gt;llmstxt.org&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>llms</category>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Launching another one LLMs.txt Generator! 🚀</title>
      <dc:creator>David Evdoshchenko</dc:creator>
      <pubDate>Sat, 14 Mar 2026 14:03:10 +0000</pubDate>
      <link>https://dev.to/outcomer/launching-another-one-llmstxt-generator-13b0</link>
      <guid>https://dev.to/outcomer/launching-another-one-llmstxt-generator-13b0</guid>
      <description>&lt;p&gt;Built a tool that automatically generates llms.txt files for websites using AI and really for free (up to 5000 pages under one domain).&lt;/p&gt;

&lt;p&gt;What it does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Helps LLM models better understand your site's content&lt;/li&gt;
&lt;li&gt;✅ Automatic crawling and page processing&lt;/li&gt;
&lt;li&gt;✅ Support for multiple AI models&lt;/li&gt;
&lt;li&gt;✅ From URL to ready llms.txt in minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tech stack: SvelteKit, NestJS, TypeScript, WebSocket for real-time updates.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🕸️ Web version: &lt;a href="https://llmstxtgenerator.svcpool.com" rel="noopener noreferrer"&gt;https://llmstxtgenerator.svcpool.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;👊 API as well: &lt;a href="https://llmstxtgenerator.svcpool.com/api" rel="noopener noreferrer"&gt;https://llmstxtgenerator.svcpool.com/api&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Would love to hear your thoughts!&lt;/p&gt;

</description>
      <category>llmstxtgenerator</category>
      <category>ai</category>
      <category>seo</category>
    </item>
    <item>
      <title>I had to build my own Symfony validation bundle because no existing one fits my requirements</title>
      <dc:creator>David Evdoshchenko</dc:creator>
      <pubDate>Wed, 11 Mar 2026 22:02:22 +0000</pubDate>
      <link>https://dev.to/outcomer/i-had-to-build-my-own-symfony-validation-bundle-because-no-existing-one-fits-my-requirements-5abp</link>
      <guid>https://dev.to/outcomer/i-had-to-build-my-own-symfony-validation-bundle-because-no-existing-one-fits-my-requirements-5abp</guid>
      <description>&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Long story short&lt;/li&gt;
&lt;li&gt;The Problem&lt;/li&gt;
&lt;li&gt;The Idea&lt;/li&gt;
&lt;li&gt;The Solution&lt;/li&gt;
&lt;li&gt;
Quick Examples

&lt;ul&gt;
&lt;li&gt;Example 1&lt;/li&gt;
&lt;li&gt;Example 2&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;What's the result?&lt;/li&gt;

&lt;li&gt;The Full Story&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Long story short
&lt;/h2&gt;

&lt;p&gt;I created a bundle for request validation with JSON Schema because no existing "schema-first" validator fit my requirements. Now I just attach a JSON file to a route and get everything at once: validation, DTO mapping, and OpenAPI documentation from a single source of truth.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/outcomer/symfony-json-schema-validation" rel="noopener noreferrer"&gt;https://github.com/outcomer/symfony-json-schema-validation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://outcomer.github.io/symfony-json-schema-validation/" rel="noopener noreferrer"&gt;https://outcomer.github.io/symfony-json-schema-validation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Most validation solutions that can generate API documentation from code (in the Symfony world I mostly mean FOSRestBundle and API Platform) assume that your business logic is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Well defined and relatively stable&lt;/li&gt;
&lt;li&gt;Close to a classic CRUD model (or CRUD with small deviations)&lt;/li&gt;
&lt;li&gt;Exposed via clean, REST-style endpoints that you fully control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, they assume your application defines the contract and the outside world adapts to it.&lt;/p&gt;

&lt;p&gt;But in many real projects it is the opposite: the API contract is defined somewhere else (legacy frontend, external systems, partners), and you have to adapt to that contract. That is where problems start.&lt;/p&gt;

&lt;p&gt;Here is a simplified example. The API expects this payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"company"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"john@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"company"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Acme"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the following rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;type = "company"&lt;/code&gt; then &lt;code&gt;user.company.name&lt;/code&gt; is required&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;type = "person"&lt;/code&gt; then &lt;code&gt;user.company&lt;/code&gt; must be absent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is this the most elegant API design? Probably not. But imagine a company with 2000 people and a frontend written years ago that sends exactly this structure. You cannot just redesign everything because you do not like the shape of the JSON.&lt;/p&gt;

&lt;p&gt;Now, what does the "ideal" Symfony validation setup suggest here?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UserDto&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;#[Assert\NotBlank]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="na"&gt;#[Assert\Valid]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?CompanyDto&lt;/span&gt; &lt;span class="nv"&gt;$company&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not express the conditional logic "company.name is required when type = company". To implement this you usually end up with one of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom constraint with its own validator&lt;/li&gt;
&lt;li&gt;Manual validation logic in the controller or a service&lt;/li&gt;
&lt;li&gt;Custom normalizer / denormalizer with additional checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then another question appears: how do you forbid extra properties that are not defined in &lt;code&gt;UserDto&lt;/code&gt;? For a long time you simply could not. In newer Symfony versions you can write something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MapRequestPayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;serializationContext&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'allow_extra_attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MapQueryString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;serializationContext&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'allow_extra_attributes'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whether this works for query parameters depends on the exact types and context. For headers this approach does not work at all.&lt;/p&gt;

&lt;p&gt;You might ask: why be so strict? Why not just ignore extra parameters? Because in real life this often leads to subtle bugs. A typical scenario: you had a query parameter &lt;code&gt;offset&lt;/code&gt; and later renamed it to &lt;code&gt;page&lt;/code&gt; for consistency with other APIs. Some client code still sends &lt;code&gt;offset&lt;/code&gt;. If you ignore unknown parameters, the request "works", but returns the wrong page. You then spend time debugging something that could have been caught immediately.&lt;/p&gt;

&lt;p&gt;With strict validation the client would get a clear error about an unknown parameter, and the problem would be visible right away.&lt;/p&gt;

&lt;p&gt;My personal view is that even though an API has no visual UI, it still has UX. Clients should receive clear, precise error messages, not a generic "Provided data is incorrect". Detailed validation errors are also in the interest of the backend team: fewer support tickets, less time spent guessing what went wrong on the client side.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;The kind of validation I needed has actually existed for years, just not in the form of typical Symfony validators. I am talking about the JSON Schema standard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://json-schema.org/specification" rel="noopener noreferrer"&gt;https://json-schema.org/specification&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;JSON Schema is a declarative language for defining structure and constraints for JSON data&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is designed exactly for problems like the ones above (and much more complex ones):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conditional rules based on other fields&lt;/li&gt;
&lt;li&gt;Nested, deeply structured data&lt;/li&gt;
&lt;li&gt;Strict control over allowed and forbidden properties&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the idea was simple: instead of forcing my API contract into DTO classes and annotations, let Symfony validate incoming requests against a JSON Schema that fully describes the contract.&lt;/p&gt;

&lt;p&gt;In other words, make Symfony request validation schema-first, with JSON Schema as the single source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;The good news was that I did not need to implement JSON Schema myself. There was already a solid PHP implementation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://opis.io/json-schema/" rel="noopener noreferrer"&gt;https://opis.io/json-schema/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The library takes a valid JSON Schema and any input data, validates the data against the schema and either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;returns the data (when everything is valid), or&lt;/li&gt;
&lt;li&gt;returns a structured list of validation errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From there, the rest was mostly integration work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make it convenient to plug this validation into a Symfony project&lt;/li&gt;
&lt;li&gt;Wire it into the request lifecycle&lt;/li&gt;
&lt;li&gt;Provide a way to map validated data into DTOs when needed&lt;/li&gt;
&lt;li&gt;Integrate with Nelmio so that OpenAPI documentation is generated from the same schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below I will show a couple of short examples. In the "The Full Story" section I describe how I removed duplication between validation attributes and documentation, and how the bundle ended up solving both validation and API specification generation from a single source of truth: the JSON Schema files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Examples
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Example 1&lt;/li&gt;
&lt;li&gt;Example 2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The schema&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://json-schema.org/draft-07/schema#"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"headers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"authorization"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bearer token for authentication"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"pattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^Bearer .+"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ6..."&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"x-api-version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"API version"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v2"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v1"&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"content-type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Request content type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/json"&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"minLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"maxLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User's full name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Jane Smith"&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User's email address"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"john.doe@example.com"&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"maximum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User's age (optional)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                    &lt;/span&gt;&lt;span class="nl"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 1: validation using the built-in request object
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;OA\Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operationId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'validateUser'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Validate user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="na"&gt;#[Route('/user', name: '_example_validation_user', methods: ['POST'])]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;validateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MapRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'./user-create.json'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="nc"&gt;ValidatedRequest&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getPayload&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$body&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getBody&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'User data is valid'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'data'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'email'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'age'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'example'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'This uses ValidatedRequest (standard way)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 2: validation using a custom DTO (via ValidatedDtoInterface)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;OA\Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operationId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'createProfile'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Create profile'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="na"&gt;#[Route('/profile', name: '_example_validation_profile', methods: ['POST'])]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;createProfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MapRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'./user-create.json'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="nc"&gt;UserApiDtoRequest&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Profile created successfully'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'profile'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'email'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'age'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'note'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'This demonstrates DTO auto-injection: MapRequestResolver calls %s::fromPayload() automatically'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;UserApiDtoRequest&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's the result?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Focused solution: instead of reinventing the wheel, the bundle fills a specific gap that existing Symfony tools do not cover well (schema-first request validation).&lt;/li&gt;
&lt;li&gt;Less duplication: you no longer have to mirror the same rules in DTO constraints, controllers and OpenAPI annotations.&lt;/li&gt;
&lt;li&gt;Automatic sync: Nelmio builds documentation from the same JSON Schema files that are used for validation, so your docs always match the real behavior.&lt;/li&gt;
&lt;li&gt;Contract-centric design: the entire API contract lives in clean JSON files rather than being scattered across PHP attributes and PHP classes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you were looking for a way to validate requests without creating a large number of redundant DTO classes and annotations, this bundle is designed for that use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/outcomer/symfony-json-schema-validation" rel="noopener noreferrer"&gt;https://github.com/outcomer/symfony-json-schema-validation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://outcomer.github.io/symfony-json-schema-validation/" rel="noopener noreferrer"&gt;https://outcomer.github.io/symfony-json-schema-validation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Full Story
&lt;/h2&gt;

&lt;p&gt;This article is the short, focused version of the story: what the bundle does and how to start using it. If you want the long version with all the scars and details, I wrote a separate, much bigger post.&lt;/p&gt;

&lt;p&gt;In that post I go through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how the bundle was born in a very messy real project, not in a greenfield demo&lt;/li&gt;
&lt;li&gt;why classic DTO + Assertions validation did not survive 500+ routes&lt;/li&gt;
&lt;li&gt;how JSON Schema became the single contract language for backend, frontend and docs&lt;/li&gt;
&lt;li&gt;how the bundle glues Symfony, Opis JSON Schema and Nelmio together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can read the full story here (starting from The Full Story section):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://outcomer.hashnode.dev/symfony-bundle-that-validates-anything-and-everything#the-full-story" rel="noopener noreferrer"&gt;https://outcomer.hashnode.dev/symfony-bundle-that-validates-anything-and-everything#the-full-story&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>symfony</category>
      <category>productivity</category>
      <category>api</category>
    </item>
    <item>
      <title>I built a Symfony bundle that validates anything and everything.</title>
      <dc:creator>David Evdoshchenko</dc:creator>
      <pubDate>Sun, 08 Mar 2026 13:49:08 +0000</pubDate>
      <link>https://dev.to/outcomer/i-built-a-symfony-bundle-that-validates-anything-and-everything-4nah</link>
      <guid>https://dev.to/outcomer/i-built-a-symfony-bundle-that-validates-anything-and-everything-4nah</guid>
      <description>&lt;h2&gt;
  
  
  Long story short
&lt;/h2&gt;

&lt;p&gt;I created a bundle for request validation via JSON Schema because I was fed up with the lack of a simple "schema-first" validator. Now, I just attach a JSON file to a route, and everything - validation, DTO mapping, and OpenAPI documentation - works out of the box.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/outcomer/symfony-json-schema-validation" rel="noopener noreferrer"&gt;https://github.com/outcomer/symfony-json-schema-validation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://outcomer.github.io/symfony-json-schema-validation/" rel="noopener noreferrer"&gt;https://outcomer.github.io/symfony-json-schema-validation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Feel free to ask me whatever relevant to topic via comments here;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pain Point
&lt;/h2&gt;

&lt;p&gt;I didn't set out to build my own bundle. I had a simple task: validate an incoming request against specific conditions.&lt;br&gt;
Sounds basic, right? But when I looked for a solution, I found that the modern Symfony world is stuck in one specific pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;create a DTOs;&lt;/li&gt;
&lt;li&gt;write a ton of assertions (Annotations/Attributes);&lt;/li&gt;
&lt;li&gt;if the structure is complex or dynamic-suffer and write custom normalizers;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I searched everywhere but couldn't find a straightforward way to say: "Here is my JSON, here is my schema, just check them and give me an object." Instead, I was forced to create mountains of boilerplate that I didn't need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why DTO + Assertions isn't always the answer
&lt;/h3&gt;

&lt;p&gt;The classic approach with assertions in PHP classes works fine for simple forms or strictly structured data. But as soon as you deal with deep nesting or requirements that don't play well with normalization - it's much easier to define the requirements once in the JSON Schema standard. Otherwise, you end up duplicating logic - firstly in code, then in docs.&lt;/p&gt;

&lt;p&gt;I wanted flexibility. I wanted to use a standard that everyone understands - from frontend developers to QA engineers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;How I solved it: I built a bundle that makes JSON Schema a "first-class citizen" in Symfony. Now, instead of describing validation rules in PHP code, I simply point to a schema.&lt;/p&gt;

&lt;p&gt;The bundle allows you to validate data either into a built-in DTO (which always contains validated path, query, headers, and body) or into your own custom class, provided it implements the ValidatedDtoInterface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1: Validation into the built-in ValidatedRequest
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;OA\Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operationId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'validateUser'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Validate user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="na"&gt;#[Route('/user', name: '_example_validation_user', methods: ['POST'])]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;validateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MapRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'./user-create.json'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="nc"&gt;ValidatedRequest&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getPayload&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$body&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getBody&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'User data is valid'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'data'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'email'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'age'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'example'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'This uses ValidatedRequest (standard way)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 2: Validation into your own DTO (via ValidatedDtoInterface)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;OA\Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;operationId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'createProfile'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Create profile'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="na"&gt;#[Route('/profile', name: '_example_validation_profile', methods: ['POST'])]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;createProfile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MapRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'./user-create.json'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="nc"&gt;UserApiDtoRequest&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Profile created successfully'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'profile'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'email'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'age'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$profile&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'note'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'This demonstrates DTO auto-injection: MapRequestResolver calls %1$s::fromPayload() automatically'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;UserApiDtoRequest&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's the result?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;I didn't "reinvent the wheel"; I simply filled a gap in the ecosystem;&lt;/li&gt;
&lt;li&gt;No more duplicate work: No more writing endless &lt;code&gt;#[Assert\NotBlank]&lt;/code&gt;, &lt;code&gt;#[Assert\Type("string")]&lt;/code&gt;, etc;&lt;/li&gt;
&lt;li&gt;Automatic Sync: Nelmio picks up the schema from the same file - your documentation never lies;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fylrk9zmy6mrnnxrffl03.png" alt="API Docs page" width="800" height="430"&gt;
&lt;/li&gt;
&lt;li&gt;Contract-Centric: The entire API contract lives in clean JSON files rather than being scattered across PHP attributes;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you, like me, were looking for a way to validate requests like a human being without creating dozens of redundant classes - here is the solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/outcomer/symfony-json-schema-validation" rel="noopener noreferrer"&gt;https://github.com/outcomer/symfony-json-schema-validation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://outcomer.github.io/symfony-json-schema-validation/" rel="noopener noreferrer"&gt;https://outcomer.github.io/symfony-json-schema-validation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>opensource</category>
      <category>productivity</category>
      <category>symfony</category>
      <category>api</category>
    </item>
    <item>
      <title>I had to build my own Symfony validation bundle because no existing one fits my requirements.</title>
      <dc:creator>David Evdoshchenko</dc:creator>
      <pubDate>Thu, 05 Mar 2026 19:53:54 +0000</pubDate>
      <link>https://dev.to/outcomer/stop-over-engineering-your-symfony-apis-a-pragmatic-json-schema-first-approach-to-validation-and-143p</link>
      <guid>https://dev.to/outcomer/stop-over-engineering-your-symfony-apis-a-pragmatic-json-schema-first-approach-to-validation-and-143p</guid>
      <description>&lt;p&gt;UPDATE: I have completely rewritten this guide to focus on a more pragmatic approach.&lt;/p&gt;

&lt;p&gt;My original article discussed the challenges of Symfony/OpenAPI validation, but since then, I've built a dedicated tool that solves this by using JSON Schema as the single source of truth.&lt;/p&gt;

&lt;p&gt;You can find the improved, "no-boilerplate" version of this article here: &lt;a href="https://dev.to/outcomer/i-built-a-symfony-bundle-that-validates-anything-and-everything-4nah"&gt;https://dev.to/outcomer/i-built-a-symfony-bundle-that-validates-anything-and-everything-4nah&lt;/a&gt;&lt;/p&gt;

</description>
      <category>php</category>
      <category>symfony</category>
      <category>jsonschema</category>
      <category>api</category>
    </item>
  </channel>
</rss>
