How I stopped debugging JSON parsing errors and started shipping features
TL;DR
I hate debugging the same JSON parsing error twice. AgentKit lets you define LLM agents declaratively (YAML + Pydantic), so validation happens automatically. One day of setup saved me 3+ hours/week of debugging. That's efficient laziness in action.
📚 This is Part 2 of my "Efficient Laziness" series.
- Part 1: Database Design: Start from Business Logic or Jump into Code?
- Part 2: You're reading it
- Part 3: Coming soon
The Story: When Your Newsletter Pipeline Becomes a Debugging Hell
I run a weekly tech newsletter. The pipeline should be simple:
- Fetch articles from Wallabag
- Clean HTML → Markdown
- AI analysis (summary, scoring, keywords, categorization)
- Insert into PostgreSQL
- Generate HTML newsletter
- Send automatically
The reality? Step 3 — the "AI-powered" part — was a textbook example of automated inefficiency.
Every Monday, the pipeline failed for one of these entirely predictable reasons:
❌ JSON malformed → "Invalid syntax at line 42" (manual retry required)
❌ Missing "keywords" field → "Prompt tweaked, workflow rerun" (rinse, repeat)
❌ Score = 11 instead of max 10 → "Validation node added" (because of course)
❌ Category misspelled → build category validator
Time spent debugging and patching n8n workflows: 2–3 hours.
Time spent on actual work: Statistically insignificant.
I wasn't building features. I was building safety nets for unpredictable LLM outputs.
My tolerance for waste: Zero.
The Question: What If Validation Was the System's Job, Not Mine?
Before touching any code, I asked myself THE question (sound familiar from my database design article?):
"What am I actually trying to do here?"
Not "call an API and hope for the best."
Answer: Get a validated ContentAnalysisOutput from markdown_content input. Period.
Everything else—retries, JSON parsing, validation—is plumbing. Plumbing I shouldn't have to build myself.
Let’s see what that looks like in code →
Enter AgentKit: Declarative Agents (or, How I Got Lazy the Right Way)
The obvious solution?
AgentKit — currently in open beta, because of course someone finally formalized this — operates on a principle so simple it’s almost insulting:
You declare what you need. The system handles how.
(Revolutionary, I know.)
Why this works:
- Core concept: No more babysitting JSON. No more "hope the LLM complies this time."
- Implementation: YAML + Pydantic. Because if your contract isn’t machine-enforced, it’s just a wishlist.
Before (n8n imperative workflow chaos):
[AI Node] → [Parse JSON Node]
→ IF valid JSON
→ [Validate Schema Node]
→ IF valid schema
→ ✅ Continue
→ ELSE ❌ Log error, retry with different prompt
→ ELSE ❌ Log error, retry with explicit "return valid JSON" instruction
Result: 12 nodes, 3 branches, unmaintainable — or, at best, a maintenance nightmare.
(Because nothing says "scalable" like hardcoding validation logic in a GUI.)
After (AgentKit declarative agent):
agent:
name: content_analyzer
output_schema: ContentAnalysisOutput # Pydantic model
max_retries: 2
# System handles validation & retry automatically
(No GUI. No drag-and-drop. Just a contract. Enforced.)
Result: 1 YAML file. Validation happens automatically. Retries handled by the runner.
That's efficient laziness: I defined the contract once, the system enforces it forever.
From Paper to YAML: The Agent Contract
Just like with database modeling, I started with paper. Not code. Questions first:
- What are my inputs? →
markdown_content: string - What do I need back? → Structured scoring + summary + keywords + category
- What constraints? → Scores must be 0-3, keywords minimum 3, total max 10
Then I translated that into a Pydantic model (the "schema"):
from pydantic import BaseModel, conint, Field
from typing import List
class ContentAnalysisOutput(BaseModel):
smb_applicability: conint(ge=0, le=3) # ge = greater/equal, le = less/equal
automation_potential: conint(ge=0, le=2)
economic_value: conint(ge=0, le=2)
open_source: conint(ge=0, le=2)
innovation: conint(ge=0, le=1)
total_score: conint(ge=0, le=10)
summary: str
keywords: List[str] = Field(..., min_length=3)
category: str
Why Pydantic?
- Automatic validation (no custom code)
- Clear error messages ("expected int 0-3, got 11")
- Type hints your IDE understands
- Serialization/deserialization built-in
This model is my contract. Any LLM output that doesn't match this contract gets automatically rejected and retried.
The Agent Definition: YAML as the Single Source of Truth
Here's the complete agent definition:
agent:
name: content_analyzer
description: >
Analyzes web content to generate business-oriented summaries
with normalized scoring and classification.
inputs:
- markdown_content: string
model:
provider: variable # Swap OpenAI/Mistral/Llama easily
model_name: variable
temperature: 0.1
response_format: json
prompt: |
You are a technology analyst specializing in open-source solutions.
Analyze the following content and provide a structured evaluation.
Scoring criteria (STRICT):
- SMB applicability: 0-3
- Automation potential: 0-2
- Economic value: 0-2
- Open-source: 0-2
- Innovation: 0-1
- TOTAL must be ≤ 10
Expected JSON format:
{
"smb_applicability": 0,
"automation_potential": 0,
"economic_value": 0,
"open_source": 0,
"innovation": 0,
"total_score": 0,
"summary": "...",
"keywords": ["...", "...", "..."],
"category": "..."
}
Content to analyze:
{{markdown_content}}
Why YAML?
- Same reason I use paper before coding: think once, write once
- Git-friendly (version control, diffs, rollbacks)
- Human-readable (non-devs can review prompts)
- Language-agnostic (runs anywhere: Python, Node, Go...)
- Already standard for infrastructure (Docker, K8s, Terraform)
Building the Runner: 150 Lines to Never Debug JSON Again
Since AgentKit isn't fully released, I built a minimal Python runner. Core principle:
The runner handles ALL the annoying stuff I hate doing manually.
Core execution loop with automatic retry:
from jinja2 import Template
import json
def run_agent(agent_config: dict, variables: dict, max_retries: int = 2):
"""
Execute agent with automatic validation & retry
This is the 'efficient laziness' in action:
- Template rendering: automatic
- JSON parsing errors: automatic retry
- Schema validation: automatic via Pydantic
- Error logging: structured
I never touch this code. It just works.
"""
template = Template(agent_config["agent"]["prompt"])
rendered = template.render(**variables)
for attempt in range(1, max_retries + 1):
try:
# Call LLM (OpenRouter, Mistral, OpenAI...)
raw_response = call_llm(rendered)
# Parse JSON (can fail)
parsed = json.loads(raw_response)
# Validate with Pydantic (can fail)
validated = ContentAnalysisOutput(**parsed)
# Success! Return validated data
return {
"success": True,
"output": validated.dict(),
"attempt": attempt
}
except json.JSONDecodeError:
# LLM returned invalid JSON → retry with correction prompt
print(f"⚠️ Invalid JSON (attempt {attempt}) → auto-retrying")
rendered = (
f"The previous response was invalid JSON. "
f"Please fix it and return ONLY valid JSON:\n{raw_response}"
)
except Exception as e:
# Validation failed (wrong type, out of range, etc.)
if attempt == max_retries:
return {"success": False, "error": str(e)}
print(f"❌ Validation error (attempt {attempt}): {e} → retrying")
What this does for me:
✅ Template rendering (Jinja2 variables)
✅ JSON parsing with automatic retry
✅ Schema validation with clear error messages
✅ Structured logging
✅ Retry logic with correction prompts
What I never have to do again:
❌ Build custom validation nodes
❌ Debug "why is this field missing?"
❌ Add retry logic for the 50th time
❌ Parse error messages manually
Exposing via FastAPI: One Endpoint, Infinite Agents
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import yaml
app = FastAPI()
class AgentRequest(BaseModel):
agent_name: str
markdown_content: str
@app.post("/analyze")
def analyze(req: AgentRequest):
"""
Universal agent executor
Add a new agent? Drop a YAML file in agents/.
No code changes. No deployments. Just works.
That's what I call scaling through laziness.
"""
# Load agent config
try:
with open(f"agents/{req.agent_name}.yaml") as f:
agent_config = yaml.safe_load(f)
except FileNotFoundError:
raise HTTPException(status_code=404, detail=f"Agent '{req.agent_name}' not found")
# Run agent (validation happens automatically)
result = run_agent(
agent_config,
{"markdown_content": req.markdown_content}
)
if not result["success"]:
raise HTTPException(status_code=500, detail=result["error"])
return result
Example response:
{
"success": true,
"output": {
"smb_applicability": 3,
"automation_potential": 2,
"economic_value": 2,
"open_source": 2,
"innovation": 1,
"total_score": 10,
"summary": "Milvus Lite simplifies local vector database deployment...",
"keywords": ["Milvus", "vector database", "LLM", "self-hosted"],
"category": "AI & Infrastructure"
},
"attempt": 1
}
What This Saved Me (The ROI of Efficient Laziness)
Time Investment:
- Initial setup: 1 day (YAML agent + runner + FastAPI)
- Per new agent: 30 minutes (just write YAML)
Time Saved (Weekly):
- Before: 3-4 hours debugging JSON errors, validation issues, retry logic
- After: 0 hours (system handles it)
- ROI: 12-16 hours/month saved
But More Importantly:
Mental peace.
I don't wake up Monday morning to find my newsletter broken because an LLM decided to return "keywords": "AI, automation" (string) instead of "keywords": ["AI", "automation"] (array).
The system catches it. Retries. Logs the attempt. Works.
That's the real win: I freed my brain from babysitting unpredictable LLM outputs.
A Real Example: Adding Payment Tracking (Wait, Wrong Article)
Actually, let me give you the RIGHT example:
Three weeks after deploying this, my client asked:
"Can we add sentiment analysis to the content scoring?"
My answer: "Give me 1 hour."
Why 1 hour and not 3?
Because the structure was already there:
- Copy
content_analyzer.yaml→content_analyzer_v2.yaml - Add
sentiment: strto Pydantic model - Update prompt to include sentiment scoring
- Deploy (just drop the YAML file, no code changes)
Zero refactoring. Zero debugging. Zero broken workflows.
If I'd built this the n8n way with 12 validation nodes, it would've been 2 days of:
- Adding sentiment node
- Adding validation for sentiment
- Testing all branches
- Fixing edge cases
- Praying nothing else broke
Loved "efficient laziness" in action here? Share your debugging horror stories or time-saving hacks below – let’s shape the next steps together! Check out my intro to CTEs on my newly Astro-migrated blog CTE : la clause WITH que les ORM ignorent (mais que vous devriez connaître), and stay tuned for my 3-part English series on dev.to starting next week, where I’ll apply this mindset to SQL optimization. Follow me to catch it all!**
Why This Approach Should Be the Norm (But Isn't)
Let's be honest: most developers code first, validate later (if at all).
I've always done the opposite. Not out of virtue, but out of pure efficient laziness: I hate fixing the same bug twice.
My Philosophy (Again): "Never Do Twice What Can Be Done Once"
- Spend 1 day on declarative agents vs 3 hours/week debugging? Obvious choice.
- Define schema once vs validate manually 100 times? No debate.
- Write YAML once vs maintain 12 n8n nodes? Crystal clear.
This "laziness" has always saved me time, energy, and sanity.
Why Is This Rarely Done?
Same reasons as database modeling:
- Pressure to ship fast: "We'll add validation later" (we never do)
- Belief that prompts will magically work: They won't. LLMs are creative, not consistent.
- Lack of tooling: AgentKit is new. Most people don't know this approach exists.
But What About Exploratory Projects?
Fair question. "What if I'm just testing an idea and don't know the schema yet?"
Here's my take: even a 15-minute Pydantic model forces you to ask:
- What fields do I actually need?
- What are valid ranges?
- What's required vs optional?
Without this, you code "by feeling," and that's where waste begins—even in MVP phase.
A minimal schema isn't the enemy of exploration. It's what makes exploration efficient instead of chaotic.
The 4-Step Method (Applicable to Any LLM Project)
Same method as database design, different domain:
Step 1: Identify Your Contract
Ask: "What output do I need, every single time, no exceptions?"
Not "what would be nice to have." What's non-negotiable.
Step 2: Define the Schema (Pydantic)
Write your output model. Include:
- Required fields
- Type constraints (
conint,constr, etc.) - Validation rules (
min_length,ge,le)
Step 3: Write the Agent in YAML
- Clear prompt with examples
- Input variables
- Reference to your schema
- Retry config
Step 4: Test, Don't Trust
Run 10-20 real inputs. Watch what fails. Adjust schema OR prompt, never both at once.
Scaling: When You Have 40 Agents
Problem I'm facing now: 40 agents × 2000-token prompts = unmaintainable YAML files
Solution (work in progress): Modular composition
agent:
name: content_analyzer
prompt_parts:
- role: "{{include('prompts/base/analyst_role.md')}}"
- criteria: "{{include('prompts/scoring/criteria_v2.md')}}"
- output_format: "{{include('prompts/formats/json_scoring.md')}}"
- input: "Content to analyze:\n{{markdown_content}}"
output_schema: "{{load('schemas/content_analysis_v3.json')}}"
Why this matters:
- Update scoring criteria in ONE place → applies to all agents
- Version control prompts independently
- A/B test prompt variations
- Reuse role definitions across agents
I'm building this now. Next article may cover the implementation.
What This Enables: The Bigger Picture
AgentKit represents the same shift we've seen elsewhere in tech:
From imperative ("how") to declarative ("what")
| Domain | Before | After |
|---|---|---|
| Containers | "Install X, configure Y, run Z" | Dockerfile |
| Infrastructure | "Click these buttons in AWS console" | terraform apply |
| Orchestration | "Deploy pod 1, then pod 2, then..." | kubectl apply -f config.yaml |
| LLM Agents | "Call API, parse, validate, retry..." | agents/analyzer.yaml |
The pattern:
- Define desired state
- System makes it happen
- You never touch the plumbing again
That's not just elegant. It's efficient laziness at scale.
Limitations (or: "Why This Isn’t Perfect, Just Less Terrible")
⚠️ Cost: Retries Aren't Free
If 30% of your requests need 1 retry, you're paying +30% API costs.
Mitigation:
- Write better prompts. (Or accept that LLMs are like interns: they need supervision.)
- Implement exponential backoff. (Because spamming the API is _so 2023.)_
- Monitor retry rates. (And weep quietly over your bill.)
My take: I’d rather pay 30% more and sleep than save money and debug at 3 AM. (Life’s too short for JSON parsing errors. And bad coffee.)
⚠️ Security: YAML Execution Needs Hardening
Current implementation is a proof-of-concept. Production needs:
✅ Agent name validation (prevent path traversal: ../../etc/passwd)
✅ Prompt injection detection
✅ Rate limiting per agent (Because someone will try to DoS your FastAPI endpoint. Probably you, at 2 AM.)
✅ API key rotation
✅ Signed agent files (Trust no one. Not even your future self.)
Don't run this in production without hardening.
⚠️ State Management: This Is Stateless
Complex workflows (multi-step, conditional logic) need orchestration:
- Temporal
- Argo Workflows
- Airflow
AgentKit handles single-agent execution beautifully. Multi-agent workflows? Different problem.
Implementation Checklist
If you want to try this:
Week 1: Core Setup
☐ Define 1-2 critical agents in YAML
☐ Build minimal runner (~150 lines)
☐ Add Pydantic validation
☐ Test with real inputs (10-20 examples)
Week 2: Production-Ready
☐ Add retry logic with exponential backoff
☐ Expose via FastAPI
☐ Add authentication (API keys)
☐ Implement structured logging
☐ Set up monitoring (retry rates, latency, costs)
Week 3: Hardening
☐ Input validation (sanitize agent names)
☐ Rate limiting per agent/user
☐ Error alerting (Slack/email when agents fail repeatedly)
☐ Documentation for your team
Month 2: Scaling
☐ Modular prompt composition (includes)
☐ A/B testing framework for prompts
☐ Cost tracking per agent
☐ Multi-agent orchestration (if needed)
The Bigger Question: What's Your Time Worth?
Here's the efficient laziness calculation:
Option A: Keep debugging manually
- Time: 3 hours/week × 52 weeks = 156 hours/year
- Mental cost: High (unpredictable failures)
- Scalability: Terrible (more agents = more debugging)
Option B: Build declarative system
- Time: 1 day setup + 30 min per new agent
- Mental cost: Low (system handles validation)
- Scalability: Excellent (add agents by dropping YAML files)
Break-even point: ~3 weeks
After that, it's pure gain.
Conclusion: Think Once, Validate Forever
The actual secret? Stop treating LLM outputs like artisanal handcrafted prose and start treating them like database transactions:
- Define the schema. (Yes, before writing prompts.)
- Enforce it. (No, "mostly valid" isn’t valid.)
- Profit. (Or at least stop debugging at 2 AM.)
Why this works:
- Databases figured this out in the 1970s. LLMs are just late to the party.
- Side effects of this approach:
- Fewer surprises. (Shocking.)
- More time for actual work. (Imagine that.)
- A system that fails predictably instead of creatively.
Final thought: If you’re still manually validating LLM outputs, ask yourself: "Do I enjoy suffering, or did I just not automate this yet?" (Hint: It’s the latter. Fix it.)
In my case: ContentAnalysisOutput from markdown_content. Everything else flows from that contract.
And this small discipline—writing a Pydantic model before writing prompts—saved me 12-16 hours/month.
The efficient laziness Manifesto:
Never debug the same JSON parsing error twice.
Define the contract once.
Let the system enforce it forever.
For Developers:
Adopt "efficient laziness" with LLM agents:
- Contract first (Pydantic schema)
- Agent definition (YAML)
- Universal runner (handles plumbing)
- Never touch validation logic again
For Teams Using LLMs:
If you're integrating AI into workflows, ask:
- Do we have schemas for AI outputs?
- Do we validate automatically or manually?
- How much time do we spend debugging AI responses?
A well-structured agent system means:
✅ Fewer bugs
✅ Less maintenance
✅ Easier iteration
✅ Predictable costs
✅ Peace of mind
Resources
🔗 AgentKit Docs (when available)
🐍 My GitHub repo (replace with your link)
📝 Pydantic Documentation
🚀 FastAPI Documentation
Discussion
Questions for the community:
- How much time do you spend debugging LLM outputs?
- Do you validate AI responses automatically or manually?
- What's your approach to retry logic?
- Would declarative agents change how you build AI features?
Drop your thoughts below 👇 Especially if you've faced the "invalid JSON at 2am" pain.
This is the second article in my "Efficient Laziness" series. The efficiently lazy developer never codes the same thing twice. His motto: think once, well, and move on.
If you found this useful, follow me for more deep dives on AI architecture and automation. I also run a weekly tech newsletter on open-source solutions.
Top comments (1)
This is awesome! Turning messy JSON debugging into a clean YAML + Pydantic setup is genius. Quick question, what’s been your biggest win since switching to this approach?