DEV Community

Cover image for AgentKit: How Efficient Laziness Fixes Fragile LLM Workflows
Pascal CESCATO
Pascal CESCATO

Posted on

AgentKit: How Efficient Laziness Fixes Fragile LLM Workflows

How I stopped debugging JSON parsing errors and started shipping features

TL;DR

I hate debugging the same JSON parsing error twice. AgentKit lets you define LLM agents declaratively (YAML + Pydantic), so validation happens automatically. One day of setup saved me 3+ hours/week of debugging. That's efficient laziness in action.


📚 This is Part 2 of my "Efficient Laziness" series.


The Story: When Your Newsletter Pipeline Becomes a Debugging Hell

I run a weekly tech newsletter. The pipeline should be simple:

  1. Fetch articles from Wallabag
  2. Clean HTML → Markdown
  3. AI analysis (summary, scoring, keywords, categorization)
  4. Insert into PostgreSQL
  5. Generate HTML newsletter
  6. Send automatically

The reality? Step 3 — the "AI-powered" part — was a textbook example of automated inefficiency.
Every Monday, the pipeline failed for one of these entirely predictable reasons:
❌ JSON malformed → "Invalid syntax at line 42" (manual retry required)
❌ Missing "keywords" field → "Prompt tweaked, workflow rerun" (rinse, repeat)
❌ Score = 11 instead of max 10 → "Validation node added" (because of course)
❌ Category misspelled → build category validator
Time spent debugging and patching n8n workflows: 2–3 hours.
Time spent on actual work: Statistically insignificant.

I wasn't building features. I was building safety nets for unpredictable LLM outputs.

My tolerance for waste: Zero.


The Question: What If Validation Was the System's Job, Not Mine?

Before touching any code, I asked myself THE question (sound familiar from my database design article?):

"What am I actually trying to do here?"

Not "call an API and hope for the best."

Answer: Get a validated ContentAnalysisOutput from markdown_content input. Period.

Everything else—retries, JSON parsing, validation—is plumbing. Plumbing I shouldn't have to build myself.

Let’s see what that looks like in code →

Enter AgentKit: Declarative Agents (or, How I Got Lazy the Right Way)

The obvious solution?
AgentKit — currently in open beta, because of course someone finally formalized this — operates on a principle so simple it’s almost insulting:
You declare what you need. The system handles how.
(Revolutionary, I know.)

Why this works:

  • Core concept: No more babysitting JSON. No more "hope the LLM complies this time."
  • Implementation: YAML + Pydantic. Because if your contract isn’t machine-enforced, it’s just a wishlist.

Before (n8n imperative workflow chaos):

[AI Node] → [Parse JSON Node] 
  → IF valid JSON
    → [Validate Schema Node]
      → IF valid schema
        → ✅ Continue
      → ELSE ❌ Log error, retry with different prompt
  → ELSE ❌ Log error, retry with explicit "return valid JSON" instruction
Enter fullscreen mode Exit fullscreen mode

Result: 12 nodes, 3 branches, unmaintainable — or, at best, a maintenance nightmare.
(Because nothing says "scalable" like hardcoding validation logic in a GUI.)

After (AgentKit declarative agent):

agent:
  name: content_analyzer
  output_schema: ContentAnalysisOutput  # Pydantic model
  max_retries: 2
  # System handles validation & retry automatically
Enter fullscreen mode Exit fullscreen mode

(No GUI. No drag-and-drop. Just a contract. Enforced.)

Result: 1 YAML file. Validation happens automatically. Retries handled by the runner.

That's efficient laziness: I defined the contract once, the system enforces it forever.

From Paper to YAML: The Agent Contract

Just like with database modeling, I started with paper. Not code. Questions first:

  • What are my inputs? → markdown_content: string
  • What do I need back? → Structured scoring + summary + keywords + category
  • What constraints? → Scores must be 0-3, keywords minimum 3, total max 10

Then I translated that into a Pydantic model (the "schema"):

from pydantic import BaseModel, conint, Field
from typing import List

class ContentAnalysisOutput(BaseModel):
    smb_applicability: conint(ge=0, le=3)  # ge = greater/equal, le = less/equal
    automation_potential: conint(ge=0, le=2)
    economic_value: conint(ge=0, le=2)
    open_source: conint(ge=0, le=2)
    innovation: conint(ge=0, le=1)
    total_score: conint(ge=0, le=10)
    summary: str
    keywords: List[str] = Field(..., min_length=3)
    category: str
Enter fullscreen mode Exit fullscreen mode

Why Pydantic?

  • Automatic validation (no custom code)
  • Clear error messages ("expected int 0-3, got 11")
  • Type hints your IDE understands
  • Serialization/deserialization built-in

This model is my contract. Any LLM output that doesn't match this contract gets automatically rejected and retried.

The Agent Definition: YAML as the Single Source of Truth

Here's the complete agent definition:

agent:
  name: content_analyzer
  description: >
    Analyzes web content to generate business-oriented summaries
    with normalized scoring and classification.

  inputs:
    - markdown_content: string

  model:
    provider: variable  # Swap OpenAI/Mistral/Llama easily
    model_name: variable
    temperature: 0.1
    response_format: json

  prompt: |
    You are a technology analyst specializing in open-source solutions.
    Analyze the following content and provide a structured evaluation.

    Scoring criteria (STRICT):
      - SMB applicability: 0-3
      - Automation potential: 0-2
      - Economic value: 0-2
      - Open-source: 0-2
      - Innovation: 0-1
      - TOTAL must be ≤ 10

    Expected JSON format:
    {
      "smb_applicability": 0,
      "automation_potential": 0,
      "economic_value": 0,
      "open_source": 0,
      "innovation": 0,
      "total_score": 0,
      "summary": "...",
      "keywords": ["...", "...", "..."],
      "category": "..."
    }

    Content to analyze:
    {{markdown_content}}
Enter fullscreen mode Exit fullscreen mode

Why YAML?

  • Same reason I use paper before coding: think once, write once
  • Git-friendly (version control, diffs, rollbacks)
  • Human-readable (non-devs can review prompts)
  • Language-agnostic (runs anywhere: Python, Node, Go...)
  • Already standard for infrastructure (Docker, K8s, Terraform)

Building the Runner: 150 Lines to Never Debug JSON Again

Since AgentKit isn't fully released, I built a minimal Python runner. Core principle:

The runner handles ALL the annoying stuff I hate doing manually.

Core execution loop with automatic retry:

from jinja2 import Template
import json

def run_agent(agent_config: dict, variables: dict, max_retries: int = 2):
    """
    Execute agent with automatic validation & retry

    This is the 'efficient laziness' in action:
    - Template rendering: automatic
    - JSON parsing errors: automatic retry
    - Schema validation: automatic via Pydantic
    - Error logging: structured

    I never touch this code. It just works.
    """
    template = Template(agent_config["agent"]["prompt"])
    rendered = template.render(**variables)

    for attempt in range(1, max_retries + 1):
        try:
            # Call LLM (OpenRouter, Mistral, OpenAI...)
            raw_response = call_llm(rendered)

            # Parse JSON (can fail)
            parsed = json.loads(raw_response)

            # Validate with Pydantic (can fail)
            validated = ContentAnalysisOutput(**parsed)

            # Success! Return validated data
            return {
                "success": True, 
                "output": validated.dict(), 
                "attempt": attempt
            }

        except json.JSONDecodeError:
            # LLM returned invalid JSON → retry with correction prompt
            print(f"⚠️ Invalid JSON (attempt {attempt}) → auto-retrying")
            rendered = (
                f"The previous response was invalid JSON. "
                f"Please fix it and return ONLY valid JSON:\n{raw_response}"
            )

        except Exception as e:
            # Validation failed (wrong type, out of range, etc.)
            if attempt == max_retries:
                return {"success": False, "error": str(e)}
            print(f"❌ Validation error (attempt {attempt}): {e} → retrying")
Enter fullscreen mode Exit fullscreen mode

What this does for me:
✅ Template rendering (Jinja2 variables)
✅ JSON parsing with automatic retry
✅ Schema validation with clear error messages
✅ Structured logging
✅ Retry logic with correction prompts

What I never have to do again:
❌ Build custom validation nodes
❌ Debug "why is this field missing?"
❌ Add retry logic for the 50th time
❌ Parse error messages manually

Exposing via FastAPI: One Endpoint, Infinite Agents

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import yaml

app = FastAPI()

class AgentRequest(BaseModel):
    agent_name: str
    markdown_content: str

@app.post("/analyze")
def analyze(req: AgentRequest):
    """
    Universal agent executor

    Add a new agent? Drop a YAML file in agents/.
    No code changes. No deployments. Just works.

    That's what I call scaling through laziness.
    """
    # Load agent config
    try:
        with open(f"agents/{req.agent_name}.yaml") as f:
            agent_config = yaml.safe_load(f)
    except FileNotFoundError:
        raise HTTPException(status_code=404, detail=f"Agent '{req.agent_name}' not found")

    # Run agent (validation happens automatically)
    result = run_agent(
        agent_config, 
        {"markdown_content": req.markdown_content}
    )

    if not result["success"]:
        raise HTTPException(status_code=500, detail=result["error"])

    return result
Enter fullscreen mode Exit fullscreen mode

Example response:

{
  "success": true,
  "output": {
    "smb_applicability": 3,
    "automation_potential": 2,
    "economic_value": 2,
    "open_source": 2,
    "innovation": 1,
    "total_score": 10,
    "summary": "Milvus Lite simplifies local vector database deployment...",
    "keywords": ["Milvus", "vector database", "LLM", "self-hosted"],
    "category": "AI & Infrastructure"
  },
  "attempt": 1
}
Enter fullscreen mode Exit fullscreen mode

What This Saved Me (The ROI of Efficient Laziness)

Time Investment:

  • Initial setup: 1 day (YAML agent + runner + FastAPI)
  • Per new agent: 30 minutes (just write YAML)

Time Saved (Weekly):

  • Before: 3-4 hours debugging JSON errors, validation issues, retry logic
  • After: 0 hours (system handles it)
  • ROI: 12-16 hours/month saved

But More Importantly:

Mental peace.

I don't wake up Monday morning to find my newsletter broken because an LLM decided to return "keywords": "AI, automation" (string) instead of "keywords": ["AI", "automation"] (array).

The system catches it. Retries. Logs the attempt. Works.

That's the real win: I freed my brain from babysitting unpredictable LLM outputs.

A Real Example: Adding Payment Tracking (Wait, Wrong Article)

Actually, let me give you the RIGHT example:

Three weeks after deploying this, my client asked:

"Can we add sentiment analysis to the content scoring?"

My answer: "Give me 1 hour."

Why 1 hour and not 3?

Because the structure was already there:

  1. Copy content_analyzer.yamlcontent_analyzer_v2.yaml
  2. Add sentiment: str to Pydantic model
  3. Update prompt to include sentiment scoring
  4. Deploy (just drop the YAML file, no code changes)

Zero refactoring. Zero debugging. Zero broken workflows.

If I'd built this the n8n way with 12 validation nodes, it would've been 2 days of:

  • Adding sentiment node
  • Adding validation for sentiment
  • Testing all branches
  • Fixing edge cases
  • Praying nothing else broke

Loved "efficient laziness" in action here? Share your debugging horror stories or time-saving hacks below – let’s shape the next steps together! Check out my intro to CTEs on my newly Astro-migrated blog CTE : la clause WITH que les ORM ignorent (mais que vous devriez connaître), and stay tuned for my 3-part English series on dev.to starting next week, where I’ll apply this mindset to SQL optimization. Follow me to catch it all!**

Why This Approach Should Be the Norm (But Isn't)

Let's be honest: most developers code first, validate later (if at all).

I've always done the opposite. Not out of virtue, but out of pure efficient laziness: I hate fixing the same bug twice.

My Philosophy (Again): "Never Do Twice What Can Be Done Once"

  • Spend 1 day on declarative agents vs 3 hours/week debugging? Obvious choice.
  • Define schema once vs validate manually 100 times? No debate.
  • Write YAML once vs maintain 12 n8n nodes? Crystal clear.

This "laziness" has always saved me time, energy, and sanity.

Why Is This Rarely Done?

Same reasons as database modeling:

  • Pressure to ship fast: "We'll add validation later" (we never do)
  • Belief that prompts will magically work: They won't. LLMs are creative, not consistent.
  • Lack of tooling: AgentKit is new. Most people don't know this approach exists.

But What About Exploratory Projects?

Fair question. "What if I'm just testing an idea and don't know the schema yet?"

Here's my take: even a 15-minute Pydantic model forces you to ask:

  • What fields do I actually need?
  • What are valid ranges?
  • What's required vs optional?

Without this, you code "by feeling," and that's where waste begins—even in MVP phase.

A minimal schema isn't the enemy of exploration. It's what makes exploration efficient instead of chaotic.

The 4-Step Method (Applicable to Any LLM Project)

Same method as database design, different domain:

Step 1: Identify Your Contract

Ask: "What output do I need, every single time, no exceptions?"

Not "what would be nice to have." What's non-negotiable.

Step 2: Define the Schema (Pydantic)

Write your output model. Include:

  • Required fields
  • Type constraints (conint, constr, etc.)
  • Validation rules (min_length, ge, le)

Step 3: Write the Agent in YAML

  • Clear prompt with examples
  • Input variables
  • Reference to your schema
  • Retry config

Step 4: Test, Don't Trust

Run 10-20 real inputs. Watch what fails. Adjust schema OR prompt, never both at once.

Scaling: When You Have 40 Agents

Problem I'm facing now: 40 agents × 2000-token prompts = unmaintainable YAML files

Solution (work in progress): Modular composition

agent:
  name: content_analyzer
  prompt_parts:
    - role: "{{include('prompts/base/analyst_role.md')}}"
    - criteria: "{{include('prompts/scoring/criteria_v2.md')}}"
    - output_format: "{{include('prompts/formats/json_scoring.md')}}"
    - input: "Content to analyze:\n{{markdown_content}}"
  output_schema: "{{load('schemas/content_analysis_v3.json')}}"
Enter fullscreen mode Exit fullscreen mode

Why this matters:

  • Update scoring criteria in ONE place → applies to all agents
  • Version control prompts independently
  • A/B test prompt variations
  • Reuse role definitions across agents

I'm building this now. Next article may cover the implementation.

What This Enables: The Bigger Picture

AgentKit represents the same shift we've seen elsewhere in tech:

From imperative ("how") to declarative ("what")

Domain Before After
Containers "Install X, configure Y, run Z" Dockerfile
Infrastructure "Click these buttons in AWS console" terraform apply
Orchestration "Deploy pod 1, then pod 2, then..." kubectl apply -f config.yaml
LLM Agents "Call API, parse, validate, retry..." agents/analyzer.yaml

The pattern:

  1. Define desired state
  2. System makes it happen
  3. You never touch the plumbing again

That's not just elegant. It's efficient laziness at scale.

Limitations (or: "Why This Isn’t Perfect, Just Less Terrible")

⚠️ Cost: Retries Aren't Free

If 30% of your requests need 1 retry, you're paying +30% API costs.

Mitigation:

  • Write better prompts. (Or accept that LLMs are like interns: they need supervision.)
  • Implement exponential backoff. (Because spamming the API is _so 2023.)_
  • Monitor retry rates. (And weep quietly over your bill.)

My take: I’d rather pay 30% more and sleep than save money and debug at 3 AM. (Life’s too short for JSON parsing errors. And bad coffee.)

⚠️ Security: YAML Execution Needs Hardening

Current implementation is a proof-of-concept. Production needs:
✅ Agent name validation (prevent path traversal: ../../etc/passwd)
✅ Prompt injection detection
✅ Rate limiting per agent (Because someone will try to DoS your FastAPI endpoint. Probably you, at 2 AM.)
✅ API key rotation
✅ Signed agent files (Trust no one. Not even your future self.)

Don't run this in production without hardening.

⚠️ State Management: This Is Stateless

Complex workflows (multi-step, conditional logic) need orchestration:

  • Temporal
  • Argo Workflows
  • Airflow

AgentKit handles single-agent execution beautifully. Multi-agent workflows? Different problem.


Implementation Checklist

If you want to try this:

Week 1: Core Setup
☐ Define 1-2 critical agents in YAML
☐ Build minimal runner (~150 lines)
☐ Add Pydantic validation
☐ Test with real inputs (10-20 examples)

Week 2: Production-Ready
☐ Add retry logic with exponential backoff
☐ Expose via FastAPI
☐ Add authentication (API keys)
☐ Implement structured logging
☐ Set up monitoring (retry rates, latency, costs)

Week 3: Hardening
☐ Input validation (sanitize agent names)
☐ Rate limiting per agent/user
☐ Error alerting (Slack/email when agents fail repeatedly)
☐ Documentation for your team

Month 2: Scaling
☐ Modular prompt composition (includes)
☐ A/B testing framework for prompts
☐ Cost tracking per agent
☐ Multi-agent orchestration (if needed)


The Bigger Question: What's Your Time Worth?

Here's the efficient laziness calculation:

Option A: Keep debugging manually

  • Time: 3 hours/week × 52 weeks = 156 hours/year
  • Mental cost: High (unpredictable failures)
  • Scalability: Terrible (more agents = more debugging)

Option B: Build declarative system

  • Time: 1 day setup + 30 min per new agent
  • Mental cost: Low (system handles validation)
  • Scalability: Excellent (add agents by dropping YAML files)

Break-even point: ~3 weeks

After that, it's pure gain.


Conclusion: Think Once, Validate Forever

The actual secret? Stop treating LLM outputs like artisanal handcrafted prose and start treating them like database transactions:

  1. Define the schema. (Yes, before writing prompts.)
  2. Enforce it. (No, "mostly valid" isn’t valid.)
  3. Profit. (Or at least stop debugging at 2 AM.)

Why this works:

  • Databases figured this out in the 1970s. LLMs are just late to the party.
  • Side effects of this approach:
    • Fewer surprises. (Shocking.)
    • More time for actual work. (Imagine that.)
    • A system that fails predictably instead of creatively.

Final thought: If you’re still manually validating LLM outputs, ask yourself: "Do I enjoy suffering, or did I just not automate this yet?" (Hint: It’s the latter. Fix it.)

In my case: ContentAnalysisOutput from markdown_content. Everything else flows from that contract.
And this small discipline—writing a Pydantic model before writing prompts—saved me 12-16 hours/month.
The efficient laziness Manifesto:
Never debug the same JSON parsing error twice.
Define the contract once.
Let the system enforce it forever.


For Developers:

Adopt "efficient laziness" with LLM agents:

  1. Contract first (Pydantic schema)
  2. Agent definition (YAML)
  3. Universal runner (handles plumbing)
  4. Never touch validation logic again

For Teams Using LLMs:

If you're integrating AI into workflows, ask:

  • Do we have schemas for AI outputs?
  • Do we validate automatically or manually?
  • How much time do we spend debugging AI responses?

A well-structured agent system means:
✅ Fewer bugs
✅ Less maintenance
✅ Easier iteration
✅ Predictable costs
✅ Peace of mind


Resources

🔗 AgentKit Docs (when available)
🐍 My GitHub repo (replace with your link)
📝 Pydantic Documentation
🚀 FastAPI Documentation


Discussion

Questions for the community:

  1. How much time do you spend debugging LLM outputs?
  2. Do you validate AI responses automatically or manually?
  3. What's your approach to retry logic?
  4. Would declarative agents change how you build AI features?

Drop your thoughts below 👇 Especially if you've faced the "invalid JSON at 2am" pain.


This is the second article in my "Efficient Laziness" series. The efficiently lazy developer never codes the same thing twice. His motto: think once, well, and move on.

If you found this useful, follow me for more deep dives on AI architecture and automation. I also run a weekly tech newsletter on open-source solutions.

Top comments (1)

Collapse
 
roshan_sharma_7deae5e0742 profile image
roshan sharma

This is awesome! Turning messy JSON debugging into a clean YAML + Pydantic setup is genius. Quick question, what’s been your biggest win since switching to this approach?