<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Petr Nazarenko</title>
    <description>The latest articles on DEV Community by Petr Nazarenko (@petr_nazarenko_8a9f1e7bb8).</description>
    <link>https://dev.to/petr_nazarenko_8a9f1e7bb8</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3810652%2F48903ac8-8da8-4778-bc8c-90cc476b1e8a.png</url>
      <title>DEV Community: Petr Nazarenko</title>
      <link>https://dev.to/petr_nazarenko_8a9f1e7bb8</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/petr_nazarenko_8a9f1e7bb8"/>
    <language>en</language>
    <item>
      <title>I built a validation pipeline that blocks AI-generated files from reaching disk if they fail schema checks</title>
      <dc:creator>Petr Nazarenko</dc:creator>
      <pubDate>Fri, 06 Mar 2026 22:17:47 +0000</pubDate>
      <link>https://dev.to/petr_nazarenko_8a9f1e7bb8/i-built-a-validation-pipeline-that-blocks-ai-generated-files-from-reaching-disk-if-they-fail-schema-20c6</link>
      <guid>https://dev.to/petr_nazarenko_8a9f1e7bb8/i-built-a-validation-pipeline-that-blocks-ai-generated-files-from-reaching-disk-if-they-fail-schema-20c6</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I've been using local LLMs to generate structured Markdown knowledge files — architecture docs, runbooks, API references. After a few hundred files, the knowledge base becomes noise.&lt;/p&gt;

&lt;p&gt;Wrong field types. Invalid enum values. Dates in the wrong format. Domains that don't exist in the taxonomy. Dataview queries return nothing. The graph becomes useless.&lt;/p&gt;

&lt;p&gt;The issue isn't the model. It's that there's no contract between "LLM output" and "file that reaches disk."&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution: a validation gate
&lt;/h2&gt;

&lt;p&gt;AKF sits between the LLM and the filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt → LLM → Validation Engine → Error Normalizer → Retry Controller → Commit Gate → File
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;LLM generates a Markdown file with YAML frontmatter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation Engine&lt;/strong&gt; checks it — binary VALID/INVALID, typed error codes (E001–E007)&lt;/li&gt;
&lt;li&gt;If invalid, &lt;strong&gt;Error Normalizer&lt;/strong&gt; translates errors into correction instructions and sends them back to the LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry Controller&lt;/strong&gt; retries up to 3 times — aborts if the same error fires twice (prevents infinite cost loops)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit Gate&lt;/strong&gt; writes atomically — only VALID output reaches disk&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Your taxonomy, not mine
&lt;/h2&gt;

&lt;p&gt;The schema is external. You define it in &lt;code&gt;akf.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;schema_version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0"&lt;/span&gt;
&lt;span class="na"&gt;vault_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./vault"&lt;/span&gt;

&lt;span class="na"&gt;taxonomy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;api-design&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;backend-engineering&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;devops&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;security&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;concept&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;guide&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;reference&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;checklist&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;level&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;beginner&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;intermediate&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;advanced&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;draft&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;active&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;completed&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;archived&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change the taxonomy, rebuild nothing. The validation engine loads it at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ai-knowledge-filler
akf init
akf generate &lt;span class="s2"&gt;"Create a guide on Docker networking"&lt;/span&gt; &lt;span class="nt"&gt;--provider&lt;/span&gt; ollama &lt;span class="nt"&gt;--model&lt;/span&gt; llama3.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via Python API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;akf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./vault/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mistral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a reference for JWT authentication&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# only printed if validation passed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Ollama, Claude, Gemini, GPT-4. MCP server just shipped — Claude Projects can now call the pipeline directly without a human relay.&lt;/p&gt;

&lt;h2&gt;
  
  
  One unexpected insight
&lt;/h2&gt;

&lt;p&gt;When a domain triggers elevated retries, it's usually not the model failing. It's a signal that the &lt;strong&gt;taxonomy boundary is ambiguous&lt;/strong&gt; — the LLM keeps proposing something that doesn't fit any enum because the enum doesn't match how the concept naturally compresses.&lt;/p&gt;

&lt;p&gt;Retry rate becomes a health metric for your schema, not a failure metric for the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error codes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;E001&lt;/td&gt;
&lt;td&gt;type/level/status&lt;/td&gt;
&lt;td&gt;Invalid enum value&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E002&lt;/td&gt;
&lt;td&gt;any&lt;/td&gt;
&lt;td&gt;Required field missing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E003&lt;/td&gt;
&lt;td&gt;created/updated&lt;/td&gt;
&lt;td&gt;Date not ISO 8601&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E004&lt;/td&gt;
&lt;td&gt;title/tags&lt;/td&gt;
&lt;td&gt;Type mismatch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E005&lt;/td&gt;
&lt;td&gt;frontmatter&lt;/td&gt;
&lt;td&gt;General schema violation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E006&lt;/td&gt;
&lt;td&gt;domain&lt;/td&gt;
&lt;td&gt;Not in taxonomy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E007&lt;/td&gt;
&lt;td&gt;created/updated&lt;/td&gt;
&lt;td&gt;created &amp;gt; updated&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/petrnzrnk-creator/ai-knowledge-filler" rel="noopener noreferrer"&gt;ai-knowledge-filler&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;code&gt;pip install ai-knowledge-filler&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Example configs: &lt;a href="https://github.com/petrnzrnk-creator/ai-knowledge-filler/tree/main/examples" rel="noopener noreferrer"&gt;software engineering, legal ops, technical writing&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Are you dealing with schema drift in AI-generated content? What's your current approach — post-hoc review, Pydantic models, something else?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
