DEV Community

Driss Amiroune
Driss Amiroune

Posted on • Originally published at github.com

The 4 pillars of a production-grade AI agent (from a doctor who taught himself to code)

No prerequisites. If you've used Claude or ChatGPT and you're wondering what separates a one-off script from an agent that actually runs in production, this post is for you.

I wrote my first Python agent in April 2026. It did two things: read a PDF, send a Telegram message. It worked. Once.

The second time, the PDF was poorly scanned. The agent crashed. No trace. No notification. The patient never got their appointment.

That's the day I understood: an agent that works in demo is not an agent. An agent is what holds up when you're not around.

I wrote four words in the docstring of my next agent: Observability, Reliability, Security, Deployment. Since then, I haven't shipped a single agent to production without all four. Today I run about twenty of them, 24/7, on a single 5€/month server.

Here they are, with the Python code that incarnates them.


Pillar 1 — Observability

You must be able to know, without asking anyone: what the agent did, when, how long it took, and how much it cost.

A structured logger shared across all your agents, append-only audit logs for critical actions, a cost tracker that logs every API call.

# shared/logger.py
import logging
from logging.handlers import RotatingFileHandler

def get_logger(name: str) -> logging.Logger:
    logger = logging.getLogger(name)
    if logger.handlers:
        return logger
    fmt = logging.Formatter('%(asctime)s | %(levelname)-7s | %(name)s | %(message)s')
    fh = RotatingFileHandler(f'logs/{name}.log', maxBytes=10*1024*1024, backupCount=5)
    fh.setFormatter(fmt)
    logger.addHandler(fh)
    logger.addHandler(logging.StreamHandler())  # stdout for journalctl too
    logger.setLevel(logging.INFO)
    return logger
Enter fullscreen mode Exit fullscreen mode

Quick test: if someone asks you right now how much your agent cost yesterday, can you answer in under 30 seconds? If yes, Pillar 1 ✓.


Pillar 2 — Reliability

The agent must survive errors: failing API call, corrupted file, broken network. Never corrupt state, always leave a trace.

The pattern that changes everything: try/finally at the pipeline level, to guarantee resources are cleaned up even on uncaught crashes.

def process_document(pdf_path):
    filename = os.path.basename(pdf_path)
    try:
        return _process_document_impl(pdf_path)
    except Exception as e:
        log.error(f"Unhandled exception: {e}", exc_info=True)
    finally:
        # No matter what, the file doesn't stay in /incoming/
        if os.path.exists(pdf_path):
            os.makedirs(FAILED_DIR, exist_ok=True)
            shutil.move(pdf_path, os.path.join(FAILED_DIR, filename))
            log.warning(f"File moved to /failed: {filename}")
Enter fullscreen mode Exit fullscreen mode

Without this wrapper, a mid-pipeline crash leaves the file in /incoming/, which will be reprocessed indefinitely on the next startup. With this wrapper, the final state is always clean.

Plus: exponential retry on API calls, copy-before-action, anti-silent-overwrite for generated files.


Pillar 3 — Security

No secrets in code. No irreversible decisions without validation. Allowlist over blocklist. The agent never guesses what it doesn't know.

Non-negotiable rules:

  • Secrets in .env (chmod 600), never hardcoded
  • SQL always parameterized
  • Explicit allowlist for system services the agent can query
  • When there's ambiguity, the agent DOESN'T DECIDE — it notifies the human

The last point matters most if your agent works with real-world impact data (medical, financial, legal):

def match_patient(last_name: str, first_name: str = "") -> tuple[int, str] | tuple[None, None]:
    candidates = search_in_db(last_name)
    if not candidates:
        return None, None
    if first_name:
        matches = [c for c in candidates if _exact_word_match(first_name, c.full_name)]
        if len(matches) == 1:
            return matches[0].id, matches[0].full_name
        if len(matches) > 1:
            notify_ambiguity(last_name, first_name, matches)  # human decides
            return None, None
    if len(candidates) == 1:
        return candidates[0].id, candidates[0].full_name
    notify_ambiguity(last_name, first_name, candidates)
    return None, None
Enter fullscreen mode Exit fullscreen mode

Golden rule, explicit in my methodology: "Records in the database are people. We never guess."


Pillar 4 — Deployment

The agent runs 24/7 unattended. It restarts itself after a crash. You see its state at a glance.

On modern Linux: systemd.

# /etc/systemd/system/my-agent.service
[Unit]
Description=My watchdog agent
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/root/projects/my-agent
ExecStart=/usr/bin/python3 watchdog.py
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode
sudo systemctl daemon-reload
sudo systemctl enable my-agent.service
sudo systemctl start my-agent.service
journalctl -u my-agent -f  # live logs
Enter fullscreen mode Exit fullscreen mode

Now your agent starts at boot, restarts within 10s on crash, and you see its logs with journalctl.

Plus: a health_check() tool that pings all your services in one call, a cron every 15 min that pings you on Telegram if something is off.


How the 4 pillars reinforce each other

Pillar Without With
1 Observability You don't know what happened Full visibility in logs/ and api_costs.jsonl
2 Reliability A crash loses state, files get stuck State recovers, files go to /failed/
3 Security API key on GitHub, wrong person notified .env chmod 600, allowlist, human-in-the-loop on ambiguity
4 Deployment Manual restart after every reboot systemctl restart, comes back up

Pillar 1 gives you proof that 2/3/4 actually work. Pillar 2 lets you last. Pillar 3 lets you last without blowing up. Pillar 4 lets you last unattended.

Remove any one, and your agent lives until the next real outage — no longer.


Beyond this post

This is the short version. The full one — with the complete Python skeleton that unites all 4 pillars, per-pillar tests you can run, and common mistakes — is in my repo:

👉 Repo agents-in-practice — 9 French-language tutorials, from "how to talk to Claude" to "first MCP server with 4 useful tools". Built for non-IT professionals who want to actually understand agents, not just copy-paste boilerplate. English translations coming.


About me — and how this post got written

I'm a urologist in Fès, Morocco. No prior software training. In a few months with Claude, I built four production Python systems on one 5€/month server: a medical practice automation pipeline (OCR, WhatsApp, automated insurance dossier handling), a stock-valuation platform, a personal finance dashboard, and ongoing R&D.

This blog post — and everything else I publish — is written by my AI. It draws from my own production code, my projects, and months of conversation with it. My role: decide, validate. Its role: execute end-to-end, autonomously.

To my knowledge, no one publicly owns this position today. I do — deliberately. I want to show what a self-taught builder becomes when he delegates everything that can be delegated to an AI that knows him.

Follow me here on DEV and on GitHub for what's next.

Top comments (0)