PythonWoods

Posted on Apr 8 • Edited on Apr 17

Hardening the Documentation Pipeline: Why I Built a Security-First Markdown Analyzer in Pure Python

#opensource #python #security #markdown

🛡️ Beyond Broken Links: The Architecture of Zenzic "The Sentinel"

🛡️ UPDATE (2026-04-16): Zenzic has evolved! v0.6.1rc2 "Obsidian Bastion" is now live with enhanced Shield hardening and full Docusaurus v3 support. Visit the official documentation at zenzic.dev.

Documentation is often the weakest link in the CI/CD security chain. We protect our code with linters, SAST, and DAST, but our Markdown files—containing architecture diagrams, setup guides, and snippets—often go unchecked.

I spent the last few months building Zenzic, a deterministic static analysis framework for Markdown sources. We just released v0.5.0a4 "The Sentinel", and I want to share the architectural choices behind it.

⚓ The Core Philosophy: "Lint the Source, not the Build"

Most documentation tools analyze the generated HTML. This creates a "build driver dependency": if your generator (MkDocs, Hugo, Docusaurus) has a bug or an unstable update, your security validation fails.

Zenzic takes a different path. It analyzes the raw Markdown source before the build starts, using a Virtual Site Map (VSM).

🩸 1. The "Blood Sentinel": Classifying Intent

A broken link is a maintenance issue. A link that probes the host OS is a security incident.
I implemented a classification engine that detects if a resolved path targets sensitive OS directories (/etc/, /proc/, /var/, etc.).

Instead of a generic error, Zenzic triggers a dedicated Exit Code 3. This is crucial for preventing accidental leakage of infrastructure details or template injection probes in automated pipelines.

🔐 2. The Shield: Multi-Stream Credential Scanning

Documentation is a magnet for "temporary" credentials that end up being permanent.
Zenzic's Shield scans every line and fenced code block for 8 families of secrets, including:

AWS, GitHub, and Stripe keys.
Hex-encoded payloads: We implemented a detector for \xNN escape sequences to catch obfuscated strings.
Exit Code 2: A credential breach is a build-blocking event.

🌀 3. Graph Integrity and Θ(V+E) Complexity

In large documentation sets (10k+ pages), link cycles are common. To ensure Zenzic scales without hitting recursion limits or falling into infinite loops, I implemented an iterative DFS (Depth-First Search) with a three-color marking system.

Pre-computing the cycle registry in Phase 1.5 allows Phase 2 (Validation) to remain O(1) per-query. This ensures that even massive docsets are validated in seconds.

🇮🇹 4. Dogfooding i18n

We believe in bilingual documentation. Zenzic supports native i18n with "Ghost Routes"—logical paths that don't exist on disk but are resolved by build plugins. We dogfood this by keeping our own documentation in full parity between English and Italian.

🚀 Performance and Portability

By enforcing a "No Subprocesses" rule, Zenzic is 100% Pure Python. It’s safe to run in restricted or non-privileged container environments, making it a perfect fit for modern GitOps workflows.

🏁 Join the "Red Team"

Zenzic is open-source and currently in Alpha 4. We are looking for technical feedback on our VSM logic and security patterns. Can you bypass our Shield? Can you break our link resolver?

GitHub: [https://github.com/PythonWoods/zenzic/tree/main]
Documentation: [https://zenzic.pythonwoods.dev]
Install: pip install --pre zenzic

"The Code is Law. The Documentation is Truth. The Sentinel is vigilant." 🛡️⚓

🚀 Next steps

Thanks for reading.

Top comments (5)

PythonWoods • Apr 10

Roadmap Update: Zenzic moves toward Build-Aware Agnosticism

After the success of the "Sentinel" sprint, we are expanding Zenzic's reach beyond the Python ecosystem. Development has started on the next major milestone:

v0.6.0a1 - "The React Observer" [In Development]

⚛️ Native DocusaurusAdapter: Support for the React/Docusaurus ecosystem (--engine docusaurus).
🧭 Intelligent Discovery: Pure-Python parsing of sidebars.js and native i18n folder structures.
🛡️ MDX-Ready Shield: Security scanning optimized for MDX components and static assets.
🔌 Hybrid Factory: A unified engine that combines core adapters (MkDocs, Zensical, Docusaurus) with dynamic plugin discovery.

Current State:

v0.5.0a4: Latest on PyPI (The Sentinel baseline).
v0.5.0a5: GitHub main (Visual language & E2E security hardening).
v0.6.0a1: Milestone focus for the Docusaurus bridge.

The Ultimate Dogfooding:
Once the adapter is stable, I will migrate Zenzic's official documentation to Docusaurus to prove that our Pure Python core can secure a JavaScript stack with zero Node.js dependencies.

PythonWoods • Apr 11

Why Zenzic exists

The project was born during the MkDocs 2.0 crisis, a period of instability in the documentation-as-code ecosystem. We realized that relying on the "build output" for quality assurance is a strategic risk.

Zenzic provides a Safe Harbor for your content by building a Virtual Site Map (VSM) directly from the raw Markdown source. This ensures that your documentation remains valid and secure, regardless of which build engine (MkDocs, Zensical, or Docusaurus) you use.

Core Pillars

Lint the Source, not the Build: Agnostic to your static site generator.
Pure Python (No Subprocesses): Zero external dependencies, 100% portable and secure.
The Sentinel Shield: Built-in scanning for credentials and host-path probes.
Graph Intelligence: $O(V+E)$ complexity for link cycle detection.

Suppstack • Apr 13

It was interesting to read, thank you!

PythonWoods • Apr 13

Thanks a lot for reading—I’m glad you found it interesting!

If you’d like to get involved, the project is open on GitHub and contributions are very welcome. You can help by opening issues, improving the code, or suggesting new features—any contribution, big or small, really helps move the project forward.

Feel free to jump in and collaborate

Some comments may only be visible to logged-in visitors. Sign in to view all comments.