
Most documentation pipelines trust Markdown blindly. Unvalidated links, hidden credential leaks, path traversal risks, engine-specific blind spots — all of this happens before your build system even knows something is wrong.
Zenzic exists to close that gap.
In Part 1, I explained why I built it — the philosophy, the threat model, the architecture of a Pure Python analyzer that lints raw Markdown sources before any build engine touches them.
Today, v0.6.1rc1 "Obsidian Bastion" turns that idea into something much bigger: not just a linter, but a security layer for any Markdown-based documentation stack.
🎯 Where Zenzic fits
If your documentation is part of your CI pipeline, it's part of your attack surface.
Zenzic is designed for CI pipelines that handle untrusted docs, open-source projects with external contributors, teams running multiple doc engines side by side, and security-conscious workflows that need to validate content before the build — not after. Most tools in this space are engine-specific, runtime-dependent, or rely on shelling out to external processes. Zenzic is none of these.
Three core properties define it:
No subprocess execution — ever. No node, no git, no shell calls. The core library is 100% Pure Python. This isn't a convenience feature — it's a security model. A tool that spawns subprocesses is a tool that can be tricked into executing untrusted code.
Engine-agnostic analysis. Zenzic reads raw Markdown and configuration files as plain data. It never imports or executes a documentation framework. Engine-specific knowledge lives in thin, replaceable adapters that translate semantics into a neutral protocol. The core sees only a BaseAdapter — it doesn't know whether you run MkDocs, Docusaurus, or something that doesn't exist yet.
Deterministic file discovery. Every file scan is explicit. Every path is validated. There are no accidental full-repo traversals, no hidden directories slipping through. Identical source files always produce identical results.
🏛️ From linter to platform
When I wrote Part 1, Zenzic was The Sentinel — a capable linter with MkDocs awareness. It could find broken links, detect credentials, and catch orphaned pages. But it had a blind spot: it could only see one documentation ecosystem.
The 0.6.x series was about removing that limitation entirely. The goal was to build a documentation security layer, not a plugin.
| Version | Codename | Focus |
|---|---|---|
| v0.5.x | The Sentinel | Core scanning + MkDocs awareness |
| v0.6.0 | Obsidian Glass | Headless architecture |
| v0.6.1rc1 | Obsidian Bastion | Platform baseline |
The biggest single commit in this arc deleted 21,870 lines and added 888. That was the Headless Architecture transition: Zenzic stopped being a MkDocs tool and became an Analyser of Documentation Platforms. The documentation site itself was separated into its own Docusaurus-powered repository — and Zenzic now validates it using the same engine-agnostic machinery it offers to everyone else.
⚛️ Parsing Docusaurus without Node
The first concrete challenge was supporting Docusaurus v3. Its config files are TypeScript:
export default {
presets: [['classic', { docs: { routeBasePath: '/guides' } }]],
i18n: { defaultLocale: 'en', locales: ['en', 'it'] },
};
The obvious solution — calling node to evaluate the config — would violate Pillar 2 (No Subprocesses). So I built a static parser in Pure Python that extracts baseUrl, routeBasePath, locale configuration, and plugin metadata directly from the source text. No evaluation. No runtime. No JavaScript.
The adapter handles .md and .mdx sources, frontmatter slug: resolution (absolute and relative), _-prefixed exclusion (Docusaurus convention), auto-generated sidebar mode, and full i18n locale tree discovery. When it encounters dynamic config patterns (async, import(), require()), it falls back gracefully instead of crashing.
This matters beyond Docusaurus. It proves that Zenzic's Pure Python core can secure a JavaScript-based documentation stack with zero Node.js dependencies. 65 tests validate the adapter across 12 test classes.
🧱 Layered Exclusion — the real headline feature
File discovery is where most documentation tools quietly fail. A scanner that recursively walks every directory will eventually read inside .git/, node_modules/, or __pycache__/. In the best case, this is slow. In the worst case, it's a security incident.
The Layered Exclusion Manager replaces all ad-hoc directory filtering in Zenzic with a deterministic 4-level hierarchy:
| Level | Source | Behavior |
|---|---|---|
| L1 | System guardrails | Immutable — .git, node_modules, __pycache__, etc. |
| L2 |
.gitignore + forced inclusions |
Additive rules, parsed in Pure Python |
| L3 | Config (zenzic.toml) |
excluded_dirs / excluded_file_patterns
|
| L4 | CLI flags |
--exclude-dir / --include-dir at runtime |
The levels are not just a convenience API — they encode a security invariant. L1 System Guardrails are immutable: no configuration file and no CLI flag can force Zenzic to scan inside .git/ or node_modules/. This is a deliberate architectural decision. A tool that can be configured to read arbitrary system directories is a tool that can be weaponized.
At L2, .gitignore is interpreted by a built-in VCS Ignore Parser — a Pure Python .gitignore interpreter with pre-compiled regex patterns. No calls to git check-ignore. No subprocess.
At L4, a CI operator can --include-dir vendor/critical-patch/ without touching config files, or --exclude-dir drafts/ for a specific run. The hierarchy is predictable: higher levels always win.
🗡️ The Tabula Rasa refactor
This was the most invasive change in the entire release arc. I removed every single rglob() call from the codebase — all of them — and replaced them with two centralized functions in discovery.py:
def walk_files(root, exclusion_manager) -> Iterator[Path]: ...
def iter_markdown_sources(root, exclusion_manager) -> Iterator[Path]: ...
The exclusion_manager parameter is mandatory. Not Optional, no None default. If you call a scanner or validator entry point without an ExclusionManager, you get a TypeError at call time — not a silent full-tree scan at runtime.
168 call sites were updated across 13 test files. The result: accidental full-repo scans are now architecturally impossible. Every traversal is explicit, filtered, and auditable. This eliminates a common source of CI slowdowns and — more importantly — removes a class of security blind spots where sensitive directories could be inadvertently read.
🔐 Security hardening
Two targeted fixes closed real attack vectors identified during internal review.
ReDoS prevention (F2-1). Lines exceeding 1 MiB are silently truncated before Shield regex matching. A crafted documentation file with a multi-megabyte line could exploit catastrophic backtracking in credential detection patterns. This is not a theoretical concern — ReDoS is a well-documented attack against input validation layers that use unbounded regex.
Path traversal guard (F4-1). _validate_docs_root() now rejects docs_dir paths that escape the repository root. A malicious zenzic.toml pointing docs_dir: ../../../etc/ triggers Exit Code 3 (Blood Sentinel) before any file is read. Like the Shield (Exit Code 2), the Blood Sentinel cannot be suppressed or downgraded by any flag. These two non-negotiable exit codes form Zenzic's security perimeter.
🏗️ No subprocesses — now enforced, not aspirational
When I started Zenzic, "No Subprocesses" was a design goal. As of this Release Candidate, it is a verified property of the entire codebase.
The zenzic serve command has been removed entirely — it was the last place where a subprocess could theoretically be spawned. Docusaurus config is parsed as text, not evaluated via Node.js. .gitignore is interpreted in Pure Python, not via git check-ignore. The MkDocs plugin has been relocated to zenzic.integrations.mkdocs and installs separately via pip install "zenzic[mkdocs]", keeping the core free of engine-specific imports.
Zero subprocess.run(). Zero os.system(). Zero shell calls. This makes Zenzic safe to run in any container, any sandbox, any restricted CI environment — without granting it any capabilities beyond reading files.
📊 By the numbers
| Metric | Value | Why it matters |
|---|---|---|
| Test functions | 929 | High-granularity validation across parsing, discovery, and security layers |
| Source code | 11,422 LOC | Non-trivial codebase — reflects real architectural scope |
| Test code | 12,927 LOC | ~1.13x ratio with source — disciplined testing, not excess |
| Engine adapters | 4 | Proven multi-engine support: MkDocs, Docusaurus v3, Zensical, Vanilla |
| Runtime dependencies | 5 | Minimal surface area — lower supply chain risk |
| Subprocess calls | 0 | Safe in sandboxed CI and restricted environments |
On a mid-range CI runner, Zenzic scans 5,000 synthetic files in under a second, single-threaded. The benchmark script is included in the repo — run it yourself with python scripts/benchmark.py --files 5000.
⚠️ Breaking changes
This is a Release Candidate from an alpha series — breaking changes are intentional, not accidental:
-
zenzic serveremoved. Use your engine's native command:mkdocs serve,npx docusaurus start. -
MkDocs plugin relocated. From
zenzic.plugintozenzic.integrations.mkdocs. -
ExclusionManageris mandatory. No moreOptional[ExclusionManager]on scanner/validator entry points.
🏁 Run it against your docs
If your documentation is part of your build pipeline, it deserves the same validation rigour as your source code.
pip install --pre zenzic
# Let Zenzic auto-detect your engine
zenzic lint
# Or specify explicitly
zenzic lint --engine docusaurus
zenzic lint --engine mkdocs
Run it on your repo. See what it finds — before your users do.
- GitHub: github.com/PythonWoods/zenzic
- Documentation: zenzic.dev
- PyPI: pypi.org/project/zenzic
Your documentation isn't just content. It's input. Treat it accordingly.
"The Bastion holds." 🛡️
Top comments (0)