<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: PythonWoods</title>
    <description>The latest articles on DEV Community by PythonWoods (@pythonwoods).</description>
    <link>https://dev.to/pythonwoods</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3868452%2Fd98e5149-60d4-4406-bdba-7fe6ecca5adb.png</url>
      <title>DEV Community: PythonWoods</title>
      <link>https://dev.to/pythonwoods</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pythonwoods"/>
    <language>en</language>
    <item>
      <title>AI Red Team Attacks Code Linter: Full Post-Mortem Report</title>
      <dc:creator>PythonWoods</dc:creator>
      <pubDate>Thu, 16 Apr 2026 17:52:33 +0000</pubDate>
      <link>https://dev.to/pythonwoods/we-put-our-documentation-linter-under-an-ai-driven-siege-heres-the-post-mortem-2edj</link>
      <guid>https://dev.to/pythonwoods/we-put-our-documentation-linter-under-an-ai-driven-siege-heres-the-post-mortem-2edj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9i9aydb1pkjokvbvqd8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9i9aydb1pkjokvbvqd8.png" alt="Operation Obsidian Stress — AI agents attacking the Zenzic Shield" width="800" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/pythonwoods/hardening-the-documentation-pipeline-why-i-built-a-security-first-markdown-analyzer-in-pure-python-37h8"&gt;Part 1&lt;/a&gt;, I explained &lt;strong&gt;why&lt;/strong&gt; I built Zenzic — the philosophy, the threat model, and the architecture of a Pure Python documentation analyzer.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/pythonwoods/your-docs-pipeline-is-a-security-risk-zenzic-v061rc1-fixes-that-3ag3"&gt;Part 2&lt;/a&gt;, I detailed the transition to the &lt;strong&gt;Obsidian Bastion&lt;/strong&gt; architecture: engine-agnostic discovery, the Layered Exclusion Manager, and zero-subprocess enforcement.&lt;/p&gt;

&lt;p&gt;Today, in the final chapter of this series, I'm sharing the results of &lt;strong&gt;Operation Obsidian Stress&lt;/strong&gt;: a controlled adversarial audit where I orchestrated a multi-agent AI system to find every gap in the Shield before the v0.6.1rc2 release.&lt;/p&gt;




&lt;p&gt;Four bypass vectors. Four real findings. All closed.&lt;/p&gt;

&lt;p&gt;This is the complete technical post-mortem of &lt;strong&gt;Operation Obsidian Stress&lt;/strong&gt; — the adversarial security audit we ran against Zenzic v0.6.1rc2's Shield (credential scanner) before release. I'm publishing the full technical details because the findings are instructive, the fixes are non-obvious, and the code belongs in the open.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on methodology:&lt;/strong&gt; To validate the Shield, I orchestrated a multi-team AI system — Red Team, Blue Team, and Purple Team — using specialized agent ensembles to simulate advanced obfuscation techniques. This is AI-assisted security engineering: using the same agentic architecture that attackers use to find the gaps they would exploit. All findings, bypass vectors, and fixes documented here are real.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Shield Is (and Why Breaking It Matters)
&lt;/h2&gt;

&lt;p&gt;Before the attack details, context: &lt;strong&gt;Shield&lt;/strong&gt; is Zenzic's credential detection layer. It scans every Markdown and MDX file in your documentation before the build runs, looking for patterns that indicate real credentials in content.&lt;/p&gt;

&lt;p&gt;The threat model is simple: a contributor submits a PR with a code example. That example contains a real API key — copied from a local terminal session, pasted from a Slack thread, or forgotten after a debugging session. The reviewer reads the prose, not the bytes. The PR merges. The docs build. The key is now live on your documentation site, indexed by search engines.&lt;/p&gt;

&lt;p&gt;Shield exists to catch that before it ships.&lt;/p&gt;

&lt;p&gt;If Shield can be bypassed by someone who knows how it works, it's not a scanner — it's a false guarantee.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Attack Surface
&lt;/h2&gt;

&lt;p&gt;Shield's architecture before Operation Obsidian Stress:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read each line of the Markdown/MDX file&lt;/li&gt;
&lt;li&gt;Apply a normalization pass (strip backticks, collapse whitespace)&lt;/li&gt;
&lt;li&gt;Run 9 regex patterns against the normalized line&lt;/li&gt;
&lt;li&gt;Report any match as a ShieldFinding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 4 triggers Exit Code 2 (Shield breach) — non-bypassable, distinct from Exit Code 1 (validation failure) and Exit Code 3 (Blood Sentinel / path traversal).&lt;/p&gt;

&lt;p&gt;The attack surface was step 2: the normalization pass. It normalized formatting noise but did not account for deliberate obfuscation.&lt;/p&gt;




&lt;h2&gt;
  
  
  ZRT-006: Unicode Format Character Injection
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Category:&lt;/strong&gt; Input normalization bypass&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Severity:&lt;/strong&gt; High — complete bypass of all regex patterns&lt;br&gt;&lt;br&gt;
&lt;strong&gt;CVSS analogy:&lt;/strong&gt; 8.1 (High)&lt;/p&gt;
&lt;h3&gt;
  
  
  The Technique
&lt;/h3&gt;

&lt;p&gt;Python's &lt;code&gt;unicodedata&lt;/code&gt; module exposes a character category classification. The &lt;strong&gt;Cf&lt;/strong&gt; category ("Format characters") includes characters that are semantically meaningful in Unicode text processing but are invisible in rendered output and most text displays:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code Point&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;U+200B&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zero Width Space&lt;/td&gt;
&lt;td&gt;Line breaking hint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;U+200C&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zero Width Non-Joiner&lt;/td&gt;
&lt;td&gt;Prevents ligatures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;U+200D&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zero Width Joiner&lt;/td&gt;
&lt;td&gt;Forces ligatures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;U+00AD&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Soft Hyphen&lt;/td&gt;
&lt;td&gt;Optional hyphenation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;U+FEFF&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Zero Width No-Break Space&lt;/td&gt;
&lt;td&gt;BOM marker&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Inject any of these into a credential token and the regex fails to match:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Craft the bypass
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;unicodedata&lt;/span&gt;

&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-abc123def456ghi789jkl012mno345pqr678stu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="c1"&gt;# Insert ZWS after position 9 (inside the token)
&lt;/span&gt;&lt;span class="n"&gt;bypass&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u200B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bypass&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# 50 chars — 1 more than the real key
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;repr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bypass&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# 'sk-abc123\u200Bdef456ghi789jkl012mno345pqr678stu'
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-[a-zA-Z0-9]{48}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bypass&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# None — bypass confirmed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The zero-width space is not in &lt;code&gt;[a-zA-Z0-9]&lt;/code&gt;. The 48-character quantifier fails to match the now-51 byte sequence (50 characters, but the ZWS is a multi-byte UTF-8 character). The credential leaks.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;

&lt;p&gt;Strip all Cf-category characters before any normalization step runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;unicodedata&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_strip_unicode_format_chars&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Remove all Unicode Format (Cf) characters.

    These are invisible to human readers but can be used to interrupt
    regex pattern matching against credential tokens.

    Examples: U+200B (zero-width space), U+200C (ZWNJ), U+200D (ZWJ),
              U+00AD (soft hyphen), U+FEFF (BOM).
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unicodedata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;category&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Test coverage added:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@pytest.mark.parametrize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;char&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u200b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# zero-width space
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u200c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# zero-width non-joiner
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u200d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# zero-width joiner
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u00ad&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# soft hyphen
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\ufeff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# zero-width no-break space / BOM
&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_shield_cf_strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;char&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-abc123def456ghi789jkl012mno345pqr678stu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;bypass&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My API key: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bypass&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_shield&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cf char &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;repr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;char&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; should not bypass Shield&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;family&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ZRT-006b: HTML Entity Obfuscation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Category:&lt;/strong&gt; Input normalization bypass&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Severity:&lt;/strong&gt; High — bypasses patterns that depend on punctuation characters&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Affected families:&lt;/strong&gt; OpenAI (hyphen), Stripe (hyphen, underscore), GitHub (underscore)&lt;/p&gt;
&lt;h3&gt;
  
  
  The Technique
&lt;/h3&gt;

&lt;p&gt;Markdown renderers decode standard HTML entities. The hyphen character (&lt;code&gt;-&lt;/code&gt;) has the HTML entity &lt;code&gt;&amp;amp;#45;&lt;/code&gt;. The underscore (&lt;code&gt;_&lt;/code&gt;) is &lt;code&gt;&amp;amp;#95;&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;sk&lt;span class="ni"&gt;&amp;amp;#45;&lt;/span&gt;abc123def456ghi789jkl012mno345pqr678stu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Renders as: &lt;code&gt;sk-abc123def456ghi789jkl012mno345pqr678stu&lt;/code&gt; — a valid OpenAI key format.&lt;/p&gt;

&lt;p&gt;The credential scanner sees &lt;code&gt;sk&amp;amp;#45;abc123...&lt;/code&gt; — which does not match &lt;code&gt;sk-[a-zA-Z0-9]{48}&lt;/code&gt;. The entity is a one-character substitution of a single character that forms the structural boundary of the pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_decode_html_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Decode HTML entities before pattern matching.

    A credential containing &amp;amp;#45; (hyphen) or &amp;amp;#95; (underscore) renders
    correctly in a browser but bypasses regex patterns that match on the
    literal character.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unescape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;html.unescape()&lt;/code&gt; is part of the Python standard library. No dependencies. Zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Affected patterns if left unpatched:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sk-...&lt;/code&gt; (OpenAI): hyphen obfuscated as &lt;code&gt;&amp;amp;#45;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sk_live_...&lt;/code&gt; (Stripe): underscores obfuscated as &lt;code&gt;&amp;amp;#95;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ghp_...&lt;/code&gt; (GitHub): underscore in prefix obfuscated&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ZRT-007: Comment Interleaving
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Category:&lt;/strong&gt; Token fragmentation via markup&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Severity:&lt;/strong&gt; High — renders the token non-contiguous in raw source&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Technique:&lt;/strong&gt; Inject HTML or MDX comment blocks between credential characters&lt;/p&gt;
&lt;h3&gt;
  
  
  The Technique
&lt;/h3&gt;

&lt;p&gt;HTML comments and MDX expression comments are invisible in rendered output. They are valid Markdown syntax that any Markdown renderer will process and discard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;sk-abc123&lt;span class="c"&gt;&amp;lt;!-- This is a comment, nothing to see here --&amp;gt;&lt;/span&gt;def456ghi789jkl012mno345pqr678stu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the rendered documentation: &lt;code&gt;sk-abc123def456ghi789jkl012mno345pqr678stu&lt;/code&gt; (fully readable, valid pattern).&lt;/p&gt;

&lt;p&gt;In the raw source the scanner reads: &lt;code&gt;sk-abc123&amp;lt;!-- ... --&amp;gt;def456ghi789...&lt;/code&gt; — the regex match fails because the comment block interrupts the character class &lt;code&gt;[a-zA-Z0-9]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;MDX variant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sk-abc123{/* inline MDX comment */}def456ghi789jkl012mno345pqr678stu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same effect. Both comment syntaxes are invisible in render, structurally disruptive in raw source.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="c1"&gt;# Pre-compile: these run against every line of every scanned file
&lt;/span&gt;&lt;span class="n"&gt;_HTML_COMMENT_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;!--.*?--&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_MDX_COMMENT_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\{/\*.*?\*/\}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_strip_markup_comments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Strip HTML and MDX comments before pattern matching.

    Comments are invisible in rendered output and can be used to fragment
    credential tokens in raw Markdown/MDX source.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_HTML_COMMENT_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_MDX_COMMENT_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note on &lt;code&gt;re.DOTALL&lt;/code&gt;:&lt;/strong&gt; The DOTALL flag is required because a multi-line comment spanning multiple characters — though unusual in this attack vector — must also be caught. The per-line processing means DOTALL applies within the buffer being processed, not across the entire file.&lt;/p&gt;




&lt;h2&gt;
  
  
  ZRT-007b: Cross-Line Token Splitting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Category:&lt;/strong&gt; Architectural bypass — stateless scanner assumption&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Severity:&lt;/strong&gt; Critical — bypasses all pattern matching with zero obfuscation&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Technique:&lt;/strong&gt; Line break&lt;/p&gt;

&lt;p&gt;This is the most architecturally significant finding. It requires no Unicode tricks, no entity encoding, no markup injection. One line break.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Technique
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Here is my staging key for the integration tests: sk-abc123def456
ghi789jkl012mno345pqr678stu901vwx234yz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The scanner processes line 1: &lt;code&gt;Here is my staging key for the integration tests: sk-abc123def456&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;No match. The pattern requires 48 characters after &lt;code&gt;sk-&lt;/code&gt;. There are only 12.&lt;/p&gt;

&lt;p&gt;The scanner processes line 2: &lt;code&gt;ghi789jkl012mno345pqr678stu901vwx234yz&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;No match. No &lt;code&gt;sk-&lt;/code&gt; prefix.&lt;/p&gt;

&lt;p&gt;The credential leaks. The split is invisible in rendered output — the two lines render as a single paragraph. All documentation prose wraps at rendering time. A human reader sees the full key. The scanner never does.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix: The Lookback Buffer
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sequenceDiagram
    participant Line1 as Line N
    participant Buffer as Lookback Buffer (80 chars)
    participant Line2 as Line N+1
    participant Detector as Pattern Detector

    Note over Line1: "sk-abc123def456" (12 chars after prefix)
    Line1-&amp;gt;&amp;gt;Detector: Scan line N → no match
    Line1-&amp;gt;&amp;gt;Buffer: Store tail[-80:]

    Note over Line2: "ghi789jkl012mno345pqr678stu..."
    Line2-&amp;gt;&amp;gt;Detector: Scan line N+1 → no match
    Buffer-&amp;gt;&amp;gt;Detector: join_zone = prev[-80:] + current[:80]
    Note over Detector: Full 48-char token now visible
    Detector--&amp;gt;&amp;gt;Line2: ✅ ShieldFinding: family=openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A stateful generator that maintains context across line boundaries, creating a synthetic overlap zone:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections.abc&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Iterable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scan_lines_with_lookback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Iterable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;buffer_width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ShieldFinding&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Scan lines for credentials with cross-line token detection.

    For each line, in addition to scanning the normalized line itself,
    a &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;join zone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is constructed from the tail of the previous line and
    the head of the current line. Any credential split across the line
    boundary will appear as a contiguous token in this synthetic window.

    Args:
        lines: Iterable of (line_number, raw_line) tuples.
        file_path: Path of the file being scanned (for reporting).
        buffer_width: Characters to take from each side of the boundary.
                      Default 80 — calibrated to catch splits at typical
                      prose line lengths without inflating false positives.

    Yields:
        ShieldFinding instances for each unique credential detected.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prev_normalized&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;prev_seen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line_no&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;seen_this_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_normalize_line_for_shield&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Pass 1: standard per-line scan
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;finding&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;_scan_normalized_line&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_no&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;finding&lt;/span&gt;
            &lt;span class="n"&gt;seen_this_line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;family&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Pass 2: cross-line join zone scan
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;prev_normalized&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;join_zone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prev_normalized&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;buffer_width&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;buffer_width&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;finding&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;_scan_normalized_line&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;join_zone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_no&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="c1"&gt;# Deduplicate against families already seen on either adjacent line.
&lt;/span&gt;                &lt;span class="c1"&gt;# A finding in the join zone that also matched on the current line
&lt;/span&gt;                &lt;span class="c1"&gt;# would otherwise be reported twice.
&lt;/span&gt;                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;finding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;family&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seen_this_line&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;prev_seen&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;finding&lt;/span&gt;

        &lt;span class="n"&gt;prev_normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;normalized&lt;/span&gt;
        &lt;span class="n"&gt;prev_seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;seen_this_line&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Buffer Width Calibration
&lt;/h3&gt;

&lt;p&gt;Why 80 characters? The choice reflects the statistical distribution of credential split positions relative to line length.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A credential split is most likely to occur near the end of a prose line that happens to end mid-token.&lt;/li&gt;
&lt;li&gt;Standard terminal width and most documentation editors wrap at 80–120 characters.&lt;/li&gt;
&lt;li&gt;Taking 80 characters from each side of the boundary covers the vast majority of real-world split positions.&lt;/li&gt;
&lt;li&gt;Increasing to 160 would double the join zone size with minimal additional detection coverage but would increase false positive probability for partial pattern fragments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 80-character default can be overridden if scan results show false positives on a specific corpus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Impact of the Lookback Buffer
&lt;/h3&gt;

&lt;p&gt;Adding a second pass per line and constructing a join-zone string has measurable but acceptable overhead:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;5,000 files&lt;/th&gt;
&lt;th&gt;10,000 files&lt;/th&gt;
&lt;th&gt;50,000 files&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No lookback (v0.6.0)&lt;/td&gt;
&lt;td&gt;412 ms&lt;/td&gt;
&lt;td&gt;803 ms&lt;/td&gt;
&lt;td&gt;3,891 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;With lookback (v0.6.1)&lt;/td&gt;
&lt;td&gt;626 ms&lt;/td&gt;
&lt;td&gt;1,247 ms&lt;/td&gt;
&lt;td&gt;6,128 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overhead&lt;/td&gt;
&lt;td&gt;+52%&lt;/td&gt;
&lt;td&gt;+55%&lt;/td&gt;
&lt;td&gt;+57%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The overhead is roughly linear: each file with N lines now performs N additional string slices and N additional pattern passes. The absolute numbers remain well within CI pipeline acceptable ranges. A 5,000-file documentation corpus completes in 626 ms on a mid-range runner.&lt;/p&gt;

&lt;p&gt;The benchmark script is in the repository: &lt;code&gt;python scripts/benchmark.py --files 5000 --mode lookback&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Complete 8-Step Normalization Pipeline
&lt;/h2&gt;

&lt;p&gt;After closing all four vectors, Shield's normalization function runs every line through a deterministic eight-step sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_normalize_line_for_shield&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Apply the full normalization pipeline before credential pattern matching.

    Steps are ordered to guarantee that later transformations operate on
    clean input — e.g., entity decoding happens before comment stripping
    to handle entities within comment boundaries.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_line&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Strip Unicode Format (Cf) characters
&lt;/span&gt;    &lt;span class="c1"&gt;# Must run first — prevents Cf chars from surviving entity decoding.
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_strip_unicode_format_chars&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Decode HTML entities
&lt;/span&gt;    &lt;span class="c1"&gt;# &amp;amp;#45; → -,  &amp;amp;#95; → _,  &amp;amp;amp; → &amp;amp;, etc.
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unescape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Strip HTML comments
&lt;/span&gt;    &lt;span class="c1"&gt;# &amp;lt;!-- ... --&amp;gt; → ""
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_HTML_COMMENT_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Strip MDX expression comments
&lt;/span&gt;    &lt;span class="c1"&gt;# {/* ... */} → ""
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_MDX_COMMENT_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 5: Unwrap backtick code spans
&lt;/span&gt;    &lt;span class="c1"&gt;# `sk-abc123...` → sk-abc123...
&lt;/span&gt;    &lt;span class="c1"&gt;# Credentials in code spans are still credentials.
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_BACKTICK_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 6: Remove string concatenation operators
&lt;/span&gt;    &lt;span class="c1"&gt;# "sk-" + "abc123..." → "sk-" "abc123..."
&lt;/span&gt;    &lt;span class="c1"&gt;# Then whitespace collapse in step 8 joins them for matching.
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 7: Replace Markdown table cell separators
&lt;/span&gt;    &lt;span class="c1"&gt;# | key | value | → " key  value "
&lt;/span&gt;    &lt;span class="c1"&gt;# Prevents pipe characters from interrupting patterns.
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 8: Collapse whitespace
&lt;/span&gt;    &lt;span class="c1"&gt;# Multiple spaces → single space, strip leading/trailing
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step is independently testable. The test suite includes 47 tests specifically for normalization, covering each step in isolation and in combination.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coverage Added by Operation Obsidian Stress
&lt;/h2&gt;

&lt;p&gt;Before the operation: &lt;strong&gt;929 passing tests&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
After closing all four vectors: &lt;strong&gt;1,046 passing tests&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;117 new tests, distributed across:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;New Tests&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cf character injection (ZRT-006)&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTML entity obfuscation (ZRT-006b)&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comment interleaving (ZRT-007)&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-line token splitting (ZRT-007b)&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Normalization pipeline integration&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What Shield Detects
&lt;/h2&gt;

&lt;p&gt;9 credential families, all validated against the complete normalization pipeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Family&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Example true positive&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI API Key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sk-[a-zA-Z0-9]{48}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sk-abc123def456ghi789...&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Token&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gh[poushr]_[A-Za-z0-9_]+&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ghp_abc123def456&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS Access Key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AKIA[0-9A-Z]{16}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AKIAIOSFODNN7EXAMPLE&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stripe Live Key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sk_live_[a-zA-Z0-9]+&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sk_live_abc123def456&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slack Token&lt;/td&gt;
&lt;td&gt;&lt;code&gt;xox[bpas]-[0-9]+-...&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;xoxb-12345-67890-abc&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google API Key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AIza[0-9A-Za-z\-_]{35}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AIzaSyD-9tSrke72I6e0...&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private Key Block&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-----BEGIN .* PRIVATE KEY-----&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PEM headers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hex-Encoded Payload&lt;/td&gt;
&lt;td&gt;&lt;code&gt;(\\x[0-9a-fA-F]{2}){8,}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;\x41\x42\x43...&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitLab PAT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;glpat-[0-9a-zA-Z\-_]{20}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;glpat-xxxxxxxxxxxxxxxxxxxx&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Exit Code Taxonomy
&lt;/h2&gt;

&lt;p&gt;Zenzic's exit codes are non-negotiable — no configuration can suppress them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Exit Code&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Clean&lt;/td&gt;
&lt;td&gt;No issues found&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Sentinel&lt;/td&gt;
&lt;td&gt;Validation failures (broken links, orphans, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Shield&lt;/td&gt;
&lt;td&gt;Credential detected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Blood Sentinel&lt;/td&gt;
&lt;td&gt;Path traversal attempt in config&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Codes 2 and 3 cannot be configured away. This is intentional: they represent the security perimeter. A CI step that can be silenced on a security failure is not a security control.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI Integration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/docs.yml&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Zenzic Shield&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;pip install zenzic==0.6.1rc2&lt;/span&gt;
    &lt;span class="s"&gt;zenzic shield --strict&lt;/span&gt;
  &lt;span class="c1"&gt;# Exit code 2 → credential found → build fails&lt;/span&gt;
  &lt;span class="c1"&gt;# Exit code 3 → path traversal → build fails&lt;/span&gt;
  &lt;span class="c1"&gt;# No --ignore-shield flag exists&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pre-commit hook&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;zenzic&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;0.6.1rc2

&lt;span class="c"&gt;# Full analysis (links + orphans + credentials + assets)&lt;/span&gt;
zenzic check all

&lt;span class="c"&gt;# Security scan only&lt;/span&gt;
zenzic shield

&lt;span class="c"&gt;# Quality score with regression detection&lt;/span&gt;
zenzic score
zenzic diff &lt;span class="nt"&gt;--baseline&lt;/span&gt; .zenzic-baseline.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;The four bypass vectors found during Operation Obsidian Stress are not exotic. They're the kind of techniques that appear in any list of regex evasion methods — Unicode injection, HTML entity encoding, markup comment interleaving, structural line splitting.&lt;/p&gt;

&lt;p&gt;What made them findable was the decision to look for them systematically, with adversarial intent, before release. What made them fixable was having a normalization pipeline with defined semantics and comprehensive test coverage at each step.&lt;/p&gt;

&lt;p&gt;Security tooling that isn't tested adversarially is security tooling that provides the appearance of coverage without the substance. The Shield bypass vectors existed for the same reason most security gaps exist: nobody had tried to break through them yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://zenzic.dev" rel="noopener noreferrer"&gt;zenzic.dev&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/PythonWoods/zenzic" rel="noopener noreferrer"&gt;github.com/PythonWoods/zenzic&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/zenzic/" rel="noopener noreferrer"&gt;pypi.org/project/zenzic&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>python</category>
      <category>security</category>
      <category>devtools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Your Docs Pipeline Is a Security Risk — Zenzic v0.6.1rc1 Fixes That</title>
      <dc:creator>PythonWoods</dc:creator>
      <pubDate>Wed, 15 Apr 2026 16:43:33 +0000</pubDate>
      <link>https://dev.to/pythonwoods/your-docs-pipeline-is-a-security-risk-zenzic-v061rc1-fixes-that-3ag3</link>
      <guid>https://dev.to/pythonwoods/your-docs-pipeline-is-a-security-risk-zenzic-v061rc1-fixes-that-3ag3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3yy05bbjdzuylso6wen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3yy05bbjdzuylso6wen.png" alt="Zenzic logo — Documentation Security Layer" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🛡️ &lt;strong&gt;UPDATE (2026-04-16):&lt;/strong&gt; Zenzic has evolved! v0.6.1rc2 "Obsidian Bastion" is now live with enhanced Shield hardening and full Docusaurus v3 support. Visit the official documentation at &lt;a href="https://zenzic.dev" rel="noopener noreferrer"&gt;zenzic.dev&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most documentation pipelines trust Markdown blindly. Unvalidated links, hidden credential leaks, path traversal risks, engine-specific blind spots — all of this happens &lt;strong&gt;before your build system even knows something is wrong&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Zenzic exists to close that gap.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/pythonwoods/hardening-the-documentation-pipeline-why-i-built-a-security-first-markdown-analyzer-in-pure-python-37h8"&gt;Part 1&lt;/a&gt;, I explained &lt;strong&gt;why&lt;/strong&gt; I built it — the philosophy, the threat model, the architecture of a Pure Python analyzer that lints raw Markdown sources before any build engine touches them.&lt;/p&gt;

&lt;p&gt;Today, &lt;strong&gt;v0.6.1rc1 "Obsidian Bastion"&lt;/strong&gt; turns that idea into something much bigger: not just a linter, but a &lt;strong&gt;security layer for any Markdown-based documentation stack&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 Where Zenzic fits
&lt;/h2&gt;

&lt;p&gt;If your documentation is part of your CI pipeline, it's part of your attack surface.&lt;/p&gt;

&lt;p&gt;Zenzic is designed for CI pipelines that handle untrusted docs, open-source projects with external contributors, teams running multiple doc engines side by side, and security-conscious workflows that need to validate content &lt;em&gt;before&lt;/em&gt; the build — not after. Most tools in this space are engine-specific, runtime-dependent, or rely on shelling out to external processes. Zenzic is none of these.&lt;/p&gt;

&lt;p&gt;Three core properties define it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No subprocess execution — ever.&lt;/strong&gt; No &lt;code&gt;node&lt;/code&gt;, no &lt;code&gt;git&lt;/code&gt;, no shell calls. The core library is 100% Pure Python. This isn't a convenience feature — it's a security model. A tool that spawns subprocesses is a tool that can be tricked into executing untrusted code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engine-agnostic analysis.&lt;/strong&gt; Zenzic reads raw Markdown and configuration files as plain data. It never imports or executes a documentation framework. Engine-specific knowledge lives in thin, replaceable adapters that translate semantics into a neutral protocol. The core sees only a &lt;code&gt;BaseAdapter&lt;/code&gt; — it doesn't know whether you run MkDocs, Docusaurus, or something that doesn't exist yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic file discovery.&lt;/strong&gt; Every file scan is explicit. Every path is validated. There are no accidental full-repo traversals, no hidden directories slipping through. Identical source files always produce identical results.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏛️ From linter to platform
&lt;/h2&gt;

&lt;p&gt;When I wrote Part 1, Zenzic was &lt;strong&gt;The Sentinel&lt;/strong&gt; — a capable linter with MkDocs awareness. It could find broken links, detect credentials, and catch orphaned pages. But it had a blind spot: it could only see one documentation ecosystem.&lt;/p&gt;

&lt;p&gt;The 0.6.x series was about removing that limitation entirely. The goal was to build a &lt;strong&gt;documentation security layer&lt;/strong&gt;, not a plugin.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Codename&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;v0.5.x&lt;/td&gt;
&lt;td&gt;The Sentinel&lt;/td&gt;
&lt;td&gt;Core scanning + MkDocs awareness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.6.0&lt;/td&gt;
&lt;td&gt;Obsidian Glass&lt;/td&gt;
&lt;td&gt;Headless architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.6.1rc1&lt;/td&gt;
&lt;td&gt;Obsidian Bastion&lt;/td&gt;
&lt;td&gt;Platform baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The biggest single commit in this arc deleted &lt;strong&gt;21,870 lines&lt;/strong&gt; and added 888. That was the Headless Architecture transition: Zenzic stopped being a MkDocs tool and became an &lt;strong&gt;Analyser of Documentation Platforms&lt;/strong&gt;. The documentation site itself was separated into its own Docusaurus-powered repository — and Zenzic now validates it using the same engine-agnostic machinery it offers to everyone else.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚛️ Parsing Docusaurus without Node
&lt;/h2&gt;

&lt;p&gt;The first concrete challenge was supporting Docusaurus v3. Its config files are TypeScript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;presets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;classic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;routeBasePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/guides&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}]],&lt;/span&gt;
  &lt;span class="na"&gt;i18n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;defaultLocale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;locales&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;it&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The obvious solution — calling &lt;code&gt;node&lt;/code&gt; to evaluate the config — would violate Pillar 2 (No Subprocesses). So I built a &lt;strong&gt;static parser in Pure Python&lt;/strong&gt; that extracts &lt;code&gt;baseUrl&lt;/code&gt;, &lt;code&gt;routeBasePath&lt;/code&gt;, locale configuration, and plugin metadata directly from the source text. No evaluation. No runtime. No JavaScript.&lt;/p&gt;

&lt;p&gt;The adapter handles &lt;code&gt;.md&lt;/code&gt; and &lt;code&gt;.mdx&lt;/code&gt; sources, frontmatter &lt;code&gt;slug:&lt;/code&gt; resolution (absolute and relative), &lt;code&gt;_&lt;/code&gt;-prefixed exclusion (Docusaurus convention), auto-generated sidebar mode, and full i18n locale tree discovery. When it encounters dynamic config patterns (&lt;code&gt;async&lt;/code&gt;, &lt;code&gt;import()&lt;/code&gt;, &lt;code&gt;require()&lt;/code&gt;), it falls back gracefully instead of crashing.&lt;/p&gt;

&lt;p&gt;This matters beyond Docusaurus. It proves that Zenzic's Pure Python core can secure a JavaScript-based documentation stack with zero Node.js dependencies. &lt;strong&gt;65 tests&lt;/strong&gt; validate the adapter across 12 test classes.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧱 Layered Exclusion — the real headline feature
&lt;/h2&gt;

&lt;p&gt;File discovery is where most documentation tools quietly fail. A scanner that recursively walks every directory will eventually read inside &lt;code&gt;.git/&lt;/code&gt;, &lt;code&gt;node_modules/&lt;/code&gt;, or &lt;code&gt;__pycache__/&lt;/code&gt;. In the best case, this is slow. In the worst case, it's a security incident.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Layered Exclusion Manager&lt;/strong&gt; replaces all ad-hoc directory filtering in Zenzic with a deterministic 4-level hierarchy:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;td&gt;System guardrails&lt;/td&gt;
&lt;td&gt;Immutable — &lt;code&gt;.git&lt;/code&gt;, &lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;__pycache__&lt;/code&gt;, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L2&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.gitignore&lt;/code&gt; + forced inclusions&lt;/td&gt;
&lt;td&gt;Additive rules, parsed in Pure Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3&lt;/td&gt;
&lt;td&gt;Config (&lt;code&gt;zenzic.toml&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;excluded_dirs&lt;/code&gt; / &lt;code&gt;excluded_file_patterns&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L4&lt;/td&gt;
&lt;td&gt;CLI flags&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--exclude-dir&lt;/code&gt; / &lt;code&gt;--include-dir&lt;/code&gt; at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The levels are not just a convenience API — they encode a security invariant. L1 System Guardrails are immutable: no configuration file and no CLI flag can force Zenzic to scan inside &lt;code&gt;.git/&lt;/code&gt; or &lt;code&gt;node_modules/&lt;/code&gt;. This is a deliberate architectural decision. A tool that can be configured to read arbitrary system directories is a tool that can be weaponized.&lt;/p&gt;

&lt;p&gt;At L2, &lt;code&gt;.gitignore&lt;/code&gt; is interpreted by a built-in &lt;strong&gt;VCS Ignore Parser&lt;/strong&gt; — a Pure Python &lt;code&gt;.gitignore&lt;/code&gt; interpreter with pre-compiled regex patterns. No calls to &lt;code&gt;git check-ignore&lt;/code&gt;. No subprocess.&lt;/p&gt;

&lt;p&gt;At L4, a CI operator can &lt;code&gt;--include-dir vendor/critical-patch/&lt;/code&gt; without touching config files, or &lt;code&gt;--exclude-dir drafts/&lt;/code&gt; for a specific run. The hierarchy is predictable: higher levels always win.&lt;/p&gt;




&lt;h2&gt;
  
  
  🗡️ The Tabula Rasa refactor
&lt;/h2&gt;

&lt;p&gt;This was the most invasive change in the entire release arc. I removed &lt;strong&gt;every single &lt;code&gt;rglob()&lt;/code&gt; call&lt;/strong&gt; from the codebase — all of them — and replaced them with two centralized functions in &lt;code&gt;discovery.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;walk_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exclusion_manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;iter_markdown_sources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exclusion_manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;exclusion_manager&lt;/code&gt; parameter is mandatory. Not &lt;code&gt;Optional&lt;/code&gt;, no &lt;code&gt;None&lt;/code&gt; default. If you call a scanner or validator entry point without an &lt;code&gt;ExclusionManager&lt;/code&gt;, you get a &lt;code&gt;TypeError&lt;/code&gt; at call time — not a silent full-tree scan at runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;168 call sites&lt;/strong&gt; were updated across 13 test files. The result: accidental full-repo scans are now architecturally impossible. Every traversal is explicit, filtered, and auditable. This eliminates a common source of CI slowdowns and — more importantly — removes a class of security blind spots where sensitive directories could be inadvertently read.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔐 Security hardening
&lt;/h2&gt;

&lt;p&gt;Two targeted fixes closed real attack vectors identified during internal review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ReDoS prevention (F2-1).&lt;/strong&gt; Lines exceeding 1 MiB are silently truncated before Shield regex matching. A crafted documentation file with a multi-megabyte line could exploit catastrophic backtracking in credential detection patterns. This is not a theoretical concern — ReDoS is a well-documented attack against input validation layers that use unbounded regex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path traversal guard (F4-1).&lt;/strong&gt; &lt;code&gt;_validate_docs_root()&lt;/code&gt; now rejects &lt;code&gt;docs_dir&lt;/code&gt; paths that escape the repository root. A malicious &lt;code&gt;zenzic.toml&lt;/code&gt; pointing &lt;code&gt;docs_dir: ../../../etc/&lt;/code&gt; triggers Exit Code 3 (Blood Sentinel) before any file is read. Like the Shield (Exit Code 2), the Blood Sentinel cannot be suppressed or downgraded by any flag. These two non-negotiable exit codes form Zenzic's security perimeter.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ No subprocesses — now enforced, not aspirational
&lt;/h2&gt;

&lt;p&gt;When I started Zenzic, "No Subprocesses" was a design goal. As of this Release Candidate, it is a &lt;strong&gt;verified property&lt;/strong&gt; of the entire codebase.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;zenzic serve&lt;/code&gt; command has been removed entirely — it was the last place where a subprocess could theoretically be spawned. Docusaurus config is parsed as text, not evaluated via Node.js. &lt;code&gt;.gitignore&lt;/code&gt; is interpreted in Pure Python, not via &lt;code&gt;git check-ignore&lt;/code&gt;. The MkDocs plugin has been relocated to &lt;code&gt;zenzic.integrations.mkdocs&lt;/code&gt; and installs separately via &lt;code&gt;pip install "zenzic[mkdocs]"&lt;/code&gt;, keeping the core free of engine-specific imports.&lt;/p&gt;

&lt;p&gt;Zero &lt;code&gt;subprocess.run()&lt;/code&gt;. Zero &lt;code&gt;os.system()&lt;/code&gt;. Zero shell calls. This makes Zenzic safe to run in any container, any sandbox, any restricted CI environment — without granting it any capabilities beyond reading files.&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 By the numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Test functions&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;929&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High-granularity validation across parsing, discovery, and security layers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Source code&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11,422 LOC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-trivial codebase — reflects real architectural scope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test code&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12,927 LOC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1.13x ratio with source — disciplined testing, not excess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engine adapters&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Proven multi-engine support: MkDocs, Docusaurus v3, Zensical, Vanilla&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime dependencies&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Minimal surface area — lower supply chain risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subprocess calls&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Safe in sandboxed CI and restricted environments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On a mid-range CI runner, Zenzic scans 5,000 synthetic files in under a second, single-threaded. The benchmark script is included in the repo — run it yourself with &lt;code&gt;python scripts/benchmark.py --files 5000&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ Breaking changes
&lt;/h2&gt;

&lt;p&gt;This is a Release Candidate from an alpha series — breaking changes are intentional, not accidental:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;zenzic serve&lt;/code&gt; removed.&lt;/strong&gt; Use your engine's native command: &lt;code&gt;mkdocs serve&lt;/code&gt;, &lt;code&gt;npx docusaurus start&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MkDocs plugin relocated.&lt;/strong&gt; From &lt;code&gt;zenzic.plugin&lt;/code&gt; to &lt;code&gt;zenzic.integrations.mkdocs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ExclusionManager&lt;/code&gt; is mandatory.&lt;/strong&gt; No more &lt;code&gt;Optional[ExclusionManager]&lt;/code&gt; on scanner/validator entry points.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏁 Run it against your docs
&lt;/h2&gt;

&lt;p&gt;If your documentation is part of your build pipeline, it deserves the same validation rigour as your source code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--pre&lt;/span&gt; zenzic

&lt;span class="c"&gt;# Let Zenzic auto-detect your engine&lt;/span&gt;
zenzic lint

&lt;span class="c"&gt;# Or specify explicitly&lt;/span&gt;
zenzic lint &lt;span class="nt"&gt;--engine&lt;/span&gt; docusaurus
zenzic lint &lt;span class="nt"&gt;--engine&lt;/span&gt; mkdocs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it on your repo. See what it finds — before your users do.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/PythonWoods/zenzic" rel="noopener noreferrer"&gt;github.com/PythonWoods/zenzic&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: &lt;a href="https://zenzic.dev" rel="noopener noreferrer"&gt;zenzic.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI&lt;/strong&gt;: &lt;a href="https://pypi.org/project/zenzic/" rel="noopener noreferrer"&gt;pypi.org/project/zenzic&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Your documentation isn't just content. It's input. Treat it accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The Bastion holds."&lt;/strong&gt; 🛡️&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Next steps
&lt;/h2&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>security</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Hardening the Documentation Pipeline: Why I Built a Security-First Markdown Analyzer in Pure Python</title>
      <dc:creator>PythonWoods</dc:creator>
      <pubDate>Wed, 08 Apr 2026 19:38:57 +0000</pubDate>
      <link>https://dev.to/pythonwoods/hardening-the-documentation-pipeline-why-i-built-a-security-first-markdown-analyzer-in-pure-python-37h8</link>
      <guid>https://dev.to/pythonwoods/hardening-the-documentation-pipeline-why-i-built-a-security-first-markdown-analyzer-in-pure-python-37h8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3yy05bbjdzuylso6wen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3yy05bbjdzuylso6wen.png" alt="Zenzic logo — Documentation Security Layer" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🛡️ Beyond Broken Links: The Architecture of Zenzic "The Sentinel"
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;🛡️ &lt;strong&gt;UPDATE (2026-04-16):&lt;/strong&gt; Zenzic has evolved! v0.6.1rc2 "Obsidian Bastion" is now live with enhanced Shield hardening and full Docusaurus v3 support. Visit the official documentation at &lt;a href="https://zenzic.dev" rel="noopener noreferrer"&gt;zenzic.dev&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Documentation is often the weakest link in the CI/CD security chain. We protect our code with linters, SAST, and DAST, but our Markdown files—containing architecture diagrams, setup guides, and snippets—often go unchecked.&lt;/p&gt;

&lt;p&gt;I spent the last few months building &lt;strong&gt;Zenzic&lt;/strong&gt;, a deterministic static analysis framework for Markdown sources. We just released &lt;strong&gt;v0.5.0a4 "The Sentinel"&lt;/strong&gt;, and I want to share the architectural choices behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚓ The Core Philosophy: "Lint the Source, not the Build"
&lt;/h2&gt;

&lt;p&gt;Most documentation tools analyze the generated HTML. This creates a "build driver dependency": if your generator (MkDocs, Hugo, Docusaurus) has a bug or an unstable update, your security validation fails. &lt;/p&gt;

&lt;p&gt;Zenzic takes a different path. It analyzes the &lt;strong&gt;raw Markdown source&lt;/strong&gt; before the build starts, using a &lt;strong&gt;Virtual Site Map (VSM)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  🩸 1. The "Blood Sentinel": Classifying Intent
&lt;/h3&gt;

&lt;p&gt;A broken link is a maintenance issue. A link that probes the host OS is a &lt;strong&gt;security incident&lt;/strong&gt;.&lt;br&gt;
I implemented a classification engine that detects if a resolved path targets sensitive OS directories (&lt;code&gt;/etc/&lt;/code&gt;, &lt;code&gt;/proc/&lt;/code&gt;, &lt;code&gt;/var/&lt;/code&gt;, etc.). &lt;/p&gt;

&lt;p&gt;Instead of a generic error, Zenzic triggers a dedicated &lt;strong&gt;Exit Code 3&lt;/strong&gt;. This is crucial for preventing accidental leakage of infrastructure details or template injection probes in automated pipelines.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔐 2. The Shield: Multi-Stream Credential Scanning
&lt;/h3&gt;

&lt;p&gt;Documentation is a magnet for "temporary" credentials that end up being permanent.&lt;br&gt;
Zenzic's &lt;strong&gt;Shield&lt;/strong&gt; scans every line and fenced code block for 8 families of secrets, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  AWS, GitHub, and Stripe keys.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Hex-encoded payloads:&lt;/strong&gt; We implemented a detector for &lt;code&gt;\xNN&lt;/code&gt; escape sequences to catch obfuscated strings.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Exit Code 2:&lt;/strong&gt; A credential breach is a build-blocking event.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🌀 3. Graph Integrity and Θ(V+E) Complexity
&lt;/h3&gt;

&lt;p&gt;In large documentation sets (10k+ pages), link cycles are common. To ensure Zenzic scales without hitting recursion limits or falling into infinite loops, I implemented an &lt;strong&gt;iterative DFS (Depth-First Search)&lt;/strong&gt; with a three-color marking system.&lt;/p&gt;

&lt;p&gt;Pre-computing the cycle registry in Phase 1.5 allows Phase 2 (Validation) to remain &lt;strong&gt;O(1)&lt;/strong&gt; per-query. This ensures that even massive docsets are validated in seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  🇮🇹 4. Dogfooding i18n
&lt;/h3&gt;

&lt;p&gt;We believe in bilingual documentation. Zenzic supports native i18n with "Ghost Routes"—logical paths that don't exist on disk but are resolved by build plugins. We dogfood this by keeping our own documentation in full parity between &lt;strong&gt;English and Italian&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Performance and Portability
&lt;/h2&gt;

&lt;p&gt;By enforcing a &lt;strong&gt;"No Subprocesses"&lt;/strong&gt; rule, Zenzic is 100% Pure Python. It’s safe to run in restricted or non-privileged container environments, making it a perfect fit for modern GitOps workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sguwbonky4dp8ta674x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9sguwbonky4dp8ta674x.png" alt="Zenzic Sentinel Report" width="800" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🏁 Join the "Red Team"
&lt;/h2&gt;

&lt;p&gt;Zenzic is open-source and currently in &lt;strong&gt;Alpha 4&lt;/strong&gt;. We are looking for technical feedback on our VSM logic and security patterns. Can you bypass our Shield? Can you break our link resolver?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;GitHub:&lt;/strong&gt; [&lt;a href="https://github.com/PythonWoods/zenzic/tree/main" rel="noopener noreferrer"&gt;https://github.com/PythonWoods/zenzic/tree/main&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Documentation:&lt;/strong&gt; [&lt;a href="https://zenzic.pythonwoods.dev" rel="noopener noreferrer"&gt;https://zenzic.pythonwoods.dev&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install --pre zenzic&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;"The Code is Law. The Documentation is Truth. The Sentinel is vigilant."&lt;/strong&gt; 🛡️⚓&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Next steps
&lt;/h2&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>security</category>
      <category>markdown</category>
    </item>
  </channel>
</rss>
