The Missing Layer in Every AI Agent: Human-in-the-Loop Approval

#ai #programming #learning #agents

TL;DR: Watching tutorials feels productive but doesn't build real skills. The only way out is to build something ugly, stick with one stack, and read source code instead of blog posts. Here's what worked for me, including real code from a production agent.

I have a confession: for most of 2025, I was a professional tutorial watcher. Not a builder. Not an engineer. A consumer of other people's code, nodding along while someone else typed.

I run an AI automation company in Germany. We build agent-based automation for trades, property management, and logistics companies. That sounds impressive until you realize that for months, I couldn't build an agent from scratch without following a step-by-step video.

This is the story of how I broke that cycle, and what actually worked.

The Tutorial Consumption Trap

Tutorials aren't bad. They're necessary. But there's a specific pattern that kills real learning:

You watch someone build an agent with LangChain
You copy their exact code into your IDE
It runs. Dopamine hit. You feel productive.
Next morning: you can't write from langchain import from memory

The problem isn't the tutorial. It's the illusion of competence. Following along feels like learning, but your brain is in passive mode. You're watching someone else solve problems you haven't struggled with yet.

I tracked my time for two weeks and the numbers were uncomfortable: 14 hours of video tutorials, 3 hours of actual coding. I was spending over 80% of my "learning time" watching, not building.

The Rule That Changed Everything

I made one rule and enforced it ruthlessly:

No new tool, framework, or concept until I've built something that uses the last one.

Want to try CrewAI? Fine: build a working multi-agent system with what you already know first. Curious about vector databases? Show me a working SQLite-based memory system before you touch Chroma.

This rule killed my tutorial addiction because it made every new tool a reward for building, not a distraction from it.

What I Actually Built

Here's a real agent I wrote during my first week of "no tutorials." It's a document classification agent for a property management client. Nothing fancy, just an MCP server that reads incoming PDFs and routes them to the right person:

# mcp_server.py: Document classifier agent
# Part of a production workflow at centerbit.co

from mcp.server import Server, NotificationOptions
from mcp.server.models import InitializationCapabilities
import mcp.server.stdio
import mcp.types as types

server = Server("document-classifier")

# Classification rules built from real client requirements
CLASSIFICATION_RULES = {
    "invoice": ["rechnung", "invoice", "zahlbar", "amount due"],
    "contract": ["vertrag", "contract", "laufzeit", "kündigung"],
    "maintenance": ["wartung", "reparatur", "defekt", "instandhaltung"],
    "tenant": ["mieter", "mietvertrag", "wohnung", "tenant"],
}

@server.list_tools()
async def handle_list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="classify_document",
            description="Classify a document based on its text content",
            inputSchema={
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "Extracted text content from the document"
                    }
                },
                "required": ["text"]
            }
        )
    ]

@server.call_tool()
async def handle_call_tool(
    name: str, arguments: dict
) -> list[types.TextContent]:
    if name != "classify_document":
        raise ValueError(f"Unknown tool: {name}")

    text = arguments["text"].lower()
    matches = {}

    for category, keywords in CLASSIFICATION_RULES.items():
        score = sum(1 for kw in keywords if kw in text)
        if score > 0:
            matches[category] = score

    if not matches:
        return [types.TextContent(
            type="text",
            text="unclassified"
        )]

    best_match = max(matches, key=matches.get)

    return [types.TextContent(
        type="text",
        text=best_match
    )]

async def run():
    async with mcp.server.stdio.stdio_server() as (read_stream, write_stream):
        await server.run(
            read_stream,
            write_stream,
            InitializationCapabilities(
                sampling=None,
                experimental=None,
                roots=None
            ),
        )

if __name__ == "__main__":
    import asyncio
    asyncio.run(run())

This isn't a tutorial example. It's extracted from an actual workflow that runs daily. It's not elegant. The classification is keyword-based, not LLM-powered. But it solves a real problem: a property management company was spending 4 hours per week manually sorting documents. Now an agent does it in seconds.

The Tools That Actually Helped

After building several agents, here's what I kept and what I dropped:

Kept

MCP (Model Context Protocol): The standard for connecting agents to tools. Writing MCP servers in Python is straightforward once you understand the pattern. Anthropic's Python SDK is well-documented.
Facio: The agent runtime we built at centerbit. We open-sourced it because we were tired of frameworks that over-promise and under-deliver. It handles scheduling, memory, and HITL (human-in-the-loop) approvals out of the box. The key insight: an agent that runs autonomously is useless if it can't ask a human before taking critical actions.
SQLite + FTS5: For agent memory, I wasted weeks investigating vector databases before realizing full-text search on SQLite handles 90% of use cases at zero operational cost.

Dropped

LangChain: Not because it's bad, but because the abstraction layers made debugging impossible. When an agent fails silently, you need to trace the exact call chain, not navigate through RunnableSequence wrappers.
Pinecone / Weaviate: Overkill for single-tenant agent memory. Unless you're building a SaaS product with thousands of concurrent users, a local vector store or even keyword search is faster to implement and easier to debug.

The Hardest Part: Human-in-the-Loop

The biggest lesson I learned wasn't technical. It was organizational.

Agents make mistakes. They classify documents wrong. They hallucinate summaries. They route things to the wrong person. If your agent runs fully autonomously, these failures compound silently.

At centerbit, every critical agent action goes through a HITL approval step. This isn't a limitation; it's a design choice:

Document classified as "invoice"? Human confirms before it hits accounting.
Agent wants to send an email? Draft shown for review first.
Workflow triggered automatically? Notification sent, human acknowledges.

This pattern makes stakeholders trust the system. Nobody deploys an agent and says "let it run, I don't need to check." The agents that succeed in production are the ones that respect human judgment.

What I'd Tell Someone Starting Today

After building agents for real production use cases, here's what matters:

Build for a real problem, not a demo. The difference between a toy agent and a production agent isn't technical sophistication; it's whether someone actually needs the output.

Start with deterministic logic, add AI later. Most "AI agent" workflows are 80% deterministic routing and 20% LLM calls. Write the routing first. You'll be surprised how much you can automate before touching a language model.

Human-in-the-loop isn't a crutch. It's a feature. The agents people actually use are the ones that collaborate with humans, not replace them.

Stop watching and start typing. You already know enough. The gap between what you've learned from tutorials and what you need to build something real is smaller than you think.

I build AI agent systems at centerbit, an automation company in Germany. We write about practical agent development, MCP servers, and human-in-the-loop patterns. No hype, just code that runs in production.