I used to hate writing documentation. Not the code part. The actual English sentences that explain what the code does.
For three years, I manually updated our internal API docs every time we shipped a feature. It took me about 90 minutes per release. We ship twice a week. That is three hours a week, minimum.
Then we added strict type checking and more microservices. The time jumped to six hours. I was spending 15% of my work week writing descriptions for endpoints I had already built.
In January 2026, I stopped doing it manually. I built a local agent that reads our TypeScript interfaces and generates OpenAPI specs automatically.
It isn't perfect. It still hallucinates occasionally if the variable names are vague. But it gets me 90% of the way there. Now I spend 30 minutes reviewing instead of six hours writing.
Here is exactly how I set it up, including the mistakes I made along the way.
Why Existing Tools Failed Me
You might ask why I didn't just use Swagger UI or standard JSDoc parsers. I tried them. They rely on comments you write in the code.
The problem is human nature. When I am rushing to fix a bug on a Friday afternoon, I do not write detailed JSDoc comments. I write // TODO: fix this later.
Six months later, "later" never comes. The docs rot. The frontend team starts guessing how the API works. We end up with Slack threads asking, "Does this field accept null?"
I needed a system that didn't rely on my discipline. I needed something that looked at the runtime types and inferred the documentation from the structure itself.
Large Language Models in 2026 are good enough for this. They understand TypeScript inference better than most static analysis tools. They can look at a Zod schema and describe it in plain English.
The key was keeping it local. I did not want to send our proprietary API schemas to a public cloud provider. Privacy concerns aside, latency was an issue. I wanted this to run as part of the CI pipeline.
The Stack: Local LLMs and Zod
I kept the stack simple. No complex vector databases. No RAG pipelines. Just direct inference.
- LLM: Llama-3-8B-Instruct, quantized to Q4_K_M. It runs fast on my M3 MacBook Pro.
- Parser: Zod. We already used Zod for runtime validation, so the source of truth was already there.
- Orchestrator: A simple Python script using Ollama's API.
- Output: YAML files for our static site generator.
I chose Llama-3-8B because it punches above its weight for structured data tasks. It doesn't need to be creative. It needs to be consistent.
The quantization matters. Running the full precision model was slow and ate 16GB of RAM. The Q4 version uses about 5GB and responds in under two seconds for a typical endpoint.
The Implementation
The core logic is straightforward. I extract the Zod schema from our codebase. I serialize it into a JSON representation. I pass that to the LLM with a strict prompt.
Here is the Python script I use to bridge the gap. It assumes you have Ollama running locally on port 11434.
import json
import requests
import sys
def generate_doc(schema_json: str, endpoint: str) -> str:
prompt = f"""
You are a technical writer.
Convert this Zod schema JSON into a concise OpenAPI description.
Endpoint: {endpoint}
Schema: {schema_json}
Rules:
1. Describe the purpose of each field based on its name and type.
2. Keep descriptions under 10 words.
3. Output valid YAML only.
4. Do not add markdown formatting.
"""
payload = {
"model": "llama3:8b-instruct-q4_K_M",
"prompt": prompt,
"stream": False,
"temperature": 0.1
}
response = requests.post("http://localhost:11434/api/generate", json=payload)
if response.status_code != 200:
raise Exception(f"API Error: {response.text}")
return response.json()['response']
if __name__ == "__main__":
# In production, parse actual TS files to extract Zod schemas
# This is a simplified example
sample_schema = '{"userId": "string", "isActive": "boolean"}'
result = generate_doc(sample_schema, "/api/users/{id}")
print(result)
This script is naive. It doesn't handle nested objects well in this snippet. In the real repo, I recursively traverse the Zod object tree. I build a context window that includes parent keys so the LLM understands hierarchy.
The temperature setting of 0.1 is critical. I do not want creativity. I want deterministic output. If I run it twice on the same schema, I need the same result.
The Data: Before and After
I tracked my time for four weeks before and four weeks after implementation. I excluded time spent building the tool itself.
| Metric | Manual Process | AI-Assisted | Change |
|---|---|---|---|
| Time per release | 90 mins |
💡 Further Reading: I experiment with AI automation and open-source tools. Find more guides at Pi Stack.
Top comments (0)